Space Captain, First Galactic Baron, Royal Ring Knight, First Solar King and Third Moon Prince of the Space Planet New New Zealand, Second Master of the Holy Star Order, Imperial Great Lord of Space and Independent Space General, Sergeant Major Lieutenant Stanislaw Stanisgreat stood up from his space chair on the space deck to stare across the spacebow of the space ultrabattledestroyer HMS Stiff Upper Lip with his spacescope. There! In the distant depths of space, near the space planet of New South Macedonia, a blacker than black space cutter was cutting space towards the space merchant fleet sailing the space waves towards the space port of New North America. The weak, defenseless merchantmen were sitting in their space deckchairs just waiting to be picked off from space by the evil boat of space.
Space Captain Stanislaw cursed a terrible space curse under his space breath to no one in particular, which was just as well for there is no sound in space. He gave the order to turn the space wheel of the space ultrabattledestroyer towards the space cutter, hoist the main space sail to the space mast and blow the space horn to sound the space crew to readiness. The space sails billowed in the space wind and started the ultrabattledestroyer towards the nefarious space cutter at infinite velocity. Even at infinite velocity, Stanislaw knew he'd be too late to stop the space cutter from sinking the space merchantmen into the depths of space and feeding their space crews to the space fishes.
An endless instant later, the space ultrabattledestroyer closed to engagement range from the space cutter and turned its right side towards the dastardly space cutter. Stanislaw ordered a full thirtyfold space broadside alpha strike to be fired at the space cutter at his mark. Mark! The crewmen of the space cutter shouted in great alarm at the ultrabattledestroyer but their shouts were never heard in space before the report of the space guns of the ultrabattledestroyer drowned them out under a terrible booming fury of space.
Millions of space cannonballs flew across space in slow motion, tearing gaping space holes in the space sails of the space cutter. One space cannonball hit the cutter right in the middle and split it in two. The mangled space cutter turned its cowardly tail and tried to run across space to the space asteroid field in the middle of space to escape the space pursuit of the space ultrabattledestroyer.
Too bad for the space cutter, the ultrabattledestroyer had its space sensors deployed in space, giving them full knowledge of space. Space Captain Stanislaw set his magnificent space plan in motion and deployed a single space rowboat right in the space path of the space cutter. The space rowboat was carrying a deadly cargo of space gunpowder, which the spacemen aboard it set to explode with a slowburning space fuse. The spacemen jumped to space off the space rowboat and space swam back to the welcoming space arms of the space ultrabattledestroyer.
The space captain of the space cutter, Shining Diamond Vizier Democratically Elected Big Cat Corporate Chief Communist Politruck Liberal Nazi Warlord Space President of Badmanistan, General Colonel of Three Stripes and One Star Badstar Badmanis noticed the space rowboat a mere picosecond too late. His bellowed order to space duck was cut short in space as the space rowboat exploded with thunderous space roar and shot three million nuclear space lightnings everywhere in space, shattering the space cutter into five trillion pieces. An endless shower of space blood colored the space red. That was one space cutter that would not cut space again, thought the entire space crew of the space ultrabattledestroyer in unison in space.
Space Captain Stanislaw Stanisgreat turned to face his space crew and congratulated them for a space job well done. His lilywhite space suit and space tophat had been splattered crimson from the space blood and his visage was like that of a butchered space goat. The Space Admiral Peerless Thirtyfold Queen Emperor, Defender of Space, Attacker of the Realm, First Dame of Spacistan, Royala Bigshotikins climbed the ladder aboard the ultrabattledestroyer from her millionmile long gigaspacehyperdreadnoughtcarrier space flag ship and bestowed the space medal of Biggest Imperial Glory of Space upon the beaming Space Captain Stanislaw Stanisgreat. Stanislaw fought back a single space tear of infinite boundless space pride and saluted for hours at the Space Admiral.
THE END
THESE BOOKS MAKE ME ANGRY PUBLISHING, LTD.
art with code
20101226
20101219
WebGL Orrery
I wrote a small WebGL orrery a month back or so, using planet and orbit data from Wikipedia. The orbits came out a bit wrong (planet positions on orbits are wrong, the planet rotation axis rotates along the orbit, planets velocities are constant, the orbits / axes may well be flipped somewhere), but it's a passably "realistic" orrery. Well, realistic apart from the whole having to scale down orbits by a factor of thousand to make the planets visible thing. With 1:1 orbits, the planets are smaller than a pixel in size when viewed from other planets and the sun is about a pixel in diameter when viewed from Neptune.
It's quite extensible too, it should be possible to add in all the moons just by filling in their data. The coordinate system and zbuffer aren't accurate enough to do anything fancy like a spaceship sim with 1:1 orbits though :)
20101201
A deck of cards
If not for the order property, you could cram a deck of cards into a 64bit int. Though if you had a RNG cyclical in full_deck_size, that'd work to preserve the order property. Which would be enough for poker and other stackless or writeonly stack card games.
With the cyclical RNG you could pop a card off a full deck in O(1). With incomplete decks (that is, a deck with nonRNGgenerated cards removed) you'd need to go through random cards until the generated card is in the deck. The worstcase timecomplexity for popping all cards off an incomplete deck would be O(full_deck_size) and the average complexity for popping one card off an incomplete deck would be O(full_deck_size/incomplete_deck_size). Anyhow, incomplete decks would be as fast or slow as full decks, with an extra existence check slowing them down a bit further. The RNG would need a seed with log_{2}(fac(full_deck_size)) bits, so log_{2}(52!) = 226 bits for a deck without jokers.
A naive version of the RNG would be to generate a random number between 0 and 52!1, then convert the number into 52 digits in factorial base. Divide first by 51!, the result is the first card. Divide the remainder by 50!, the result is the second card (note that you have to remove the first card from consideration for the second card index). Continue dividing the remainders until you have 52 cards. Or just generate a shuffled 52 element array of numbers and go through that :P
... I should get back to work now.
Source
Output
With the cyclical RNG you could pop a card off a full deck in O(1). With incomplete decks (that is, a deck with nonRNGgenerated cards removed) you'd need to go through random cards until the generated card is in the deck. The worstcase timecomplexity for popping all cards off an incomplete deck would be O(full_deck_size) and the average complexity for popping one card off an incomplete deck would be O(full_deck_size/incomplete_deck_size). Anyhow, incomplete decks would be as fast or slow as full decks, with an extra existence check slowing them down a bit further. The RNG would need a seed with log_{2}(fac(full_deck_size)) bits, so log_{2}(52!) = 226 bits for a deck without jokers.
A naive version of the RNG would be to generate a random number between 0 and 52!1, then convert the number into 52 digits in factorial base. Divide first by 51!, the result is the first card. Divide the remainder by 50!, the result is the second card (note that you have to remove the first card from consideration for the second card index). Continue dividing the remainders until you have 52 cards. Or just generate a shuffled 52 element array of numbers and go through that :P
... I should get back to work now.
Source
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
typedef int64_t card_mask;
typedef int8_t card;
typedef struct deck {
card cards[55];
int8_t length;
card_mask mask;
} deck_t;
#define suit_size 13
static const card_mask hearts = suit_size*0;
static const card_mask spades = suit_size*1;
static const card_mask clubs = suit_size*2;
static const card_mask diamonds = suit_size*3;
static const card_mask jokers = suit_size*4;
static const char* suits[5] = {"Hearts", "Spades", "Clubs", "Diamonds", "Jokers"};
static const char* ranks[13] = {
"Ace", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten",
"Jack", "Queen", "King"
};
card_mask remove_cards_from_mask(card_mask deck, card_mask cards)
{
return deck & ~cards;
}
card_mask add_cards_to_mask(card_mask deck, card_mask cards)
{
return deck  cards;
}
card_mask query_mask(card_mask deck, card_mask cards)
{
return deck & cards;
}
card_mask mask_of_card(card c)
{
return (card_mask)1 << (card_mask)c;
}
card card_for_suit_and_rank(card_mask suit, card_mask rank)
{
return suit + rank;
}
int deck_initialize(deck_t *deck, int jokers)
{
card c = 0;
if (jokers < 0  jokers > 3)
return 0;
deck>length = 52 + jokers;
deck>mask = ((card_mask)1 << (card_mask)deck>length)1;
for (c=0; c<deck>length; c++) {
deck>cards[c] = c;
}
return 1;
}
void deck_clear(deck_t *deck)
{
deck>length = 0;
deck>mask = 0;
}
void deck_shuffle(deck_t *deck)
{
int8_t i,j;
card tmp;
for (i=deck>length1; i>0; i) {
j = rand() % (i+1);
tmp = deck>cards[j];
deck>cards[j] = deck>cards[i];
deck>cards[i] = tmp;
}
}
card_mask deck_query_mask(deck_t *deck, card_mask m)
{
return query_mask(deck>mask, m);
}
card_mask deck_query_card(deck_t *deck, card c)
{
return deck_query_mask(deck, mask_of_card(c));
}
card_mask deck_query_deck(deck_t *deck, deck_t *deck2)
{
return deck_query_mask(deck, deck2>mask);
}
int deck_add(deck_t *deck, card c)
{
card_mask m = mask_of_card(c);
if (query_mask(deck>mask, m)) {
return 0;
} else {
deck>mask = add_cards_to_mask(deck>mask, m);
deck>cards[deck>length] = c;
deck>length++;
return 1;
}
}
int deck_remove(deck_t *deck, card c)
{
int8_t i=0, j=0;
card_mask m = mask_of_card(c);
if (!query_mask(deck>mask, m)) {
return 0;
} else {
deck>mask = remove_cards_from_mask(deck>mask, m);
for (i=0,j=0; i<deck>length; i++) {
if (deck>cards[i] != c) {
deck>cards[j] = deck>cards[i];
j++;
}
}
deck>length = j;
return 1;
}
}
card deck_pop(deck_t *deck)
{
card c;
if (deck>length < 1) {
return 1;
} else {
deck>length;
c = deck>cards[deck>length];
deck>mask = remove_cards_from_mask(deck>mask, mask_of_card(c));
return c;
}
}
const char* suit_string(card s)
{
if (s >= suit_size*5)
return "Invalid";
return suits[s / suit_size];
}
const char* rank_string(card s)
{
return ranks[s % suit_size];
}
void print_card(card c)
{
printf("%s of %s\n", rank_string(c), suit_string(c));
}
void print_deck(deck_t *deck)
{
int8_t i;
printf("Deck %p:\n", (void*)deck);
for (i=0; i<deck>length; i++) {
print_card(deck>cards[i]);
}
}
void print_mask(card_mask m)
{
int i;
const char* symbols[5] = {"♥", "♠", "♣", "♦", "☻"};
for (i=0; i<64; i++) {
if (i%suit_size == 0) printf(" %s", symbols[i/suit_size]);
if ((m>>i) & 1) putchar('#');
else putchar('.');
}
putchar('\n');
}
int main()
{
int i=0;
deck_t deck, hand;
card c;
srand(time(NULL));
deck_initialize(&deck,2);
deck_initialize(&hand,0);
deck_clear(&hand);
deck_shuffle(&deck);
printf("Deck %p has %d cards and mask %ld.\n", (void*)&deck, deck.length, deck.mask);
print_mask(deck.mask);
for (c=0; c<54; c++) {
if (!deck_query_card(&deck, c)) {
printf("Error: Deck doesn't have a card.\n");
print_card(c);
}
}
printf("Hand %p has %d cards and mask %ld.\n", (void*)&hand, hand.length, hand.mask);
print_mask(hand.mask);
for (c=0; c<54; c++) {
if (deck_query_card(&hand, c)) {
printf("Error: Hand has a card it shouldn't have.\n");
print_card(c);
}
}
deck_remove(&deck, hearts);
if (deck_query_card(&hand, hearts))
{ printf("Error: Hand has a card it shouldn't have.\n"); print_card(hearts); }
deck_remove(&deck, spades);
if (deck_query_card(&hand, spades))
{ printf("Error: Hand has a card it shouldn't have.\n"); print_card(spades); }
deck_remove(&deck, clubs);
if (deck_query_card(&hand, clubs))
{ printf("Error: Hand has a card it shouldn't have.\n"); print_card(clubs); }
deck_remove(&deck, diamonds);
if (deck_query_card(&hand, diamonds))
{ printf("Error: Hand has a card it shouldn't have.\n"); print_card(diamonds); }
printf("Deck %p has %d cards and mask %ld.\n", (void*)&deck, deck.length, deck.mask);
print_mask(deck.mask);
for (i=0; i<5; i++) {
c = deck_pop(&deck);
print_card(c);
deck_add(&hand, c);
}
printf("Deck %p has %d cards and mask %ld.\n", (void*)&deck, deck.length, deck.mask);
print_mask(deck.mask);
printf("Hand %p has %d cards and mask %ld.\n", (void*)&hand, hand.length, hand.mask);
print_mask(hand.mask);
print_deck(&hand);
if (deck_query_deck(&deck, &hand)  deck_query_deck(&hand, &deck)) {
printf("Error: Deck and Hand intersect.\n");
} else {
printf("Deck and Hand are disjoint.\n");
}
for (i=0; i<hand.length; i++) {
if (deck_query_card(&deck, hand.cards[i])) {
printf("ERROR ");
} else {
printf("OK ");
}
}
printf("\n");
for (i=0; i<deck.length; i++) {
if (deck_query_card(&hand, deck.cards[i])) {
printf("Error: Found a card in Deck that is in Hand.\n");
print_card(deck.cards[i]);
}
}
print_card(suit_size*5);
return 0;
}
Output
$ ./a.out
Deck 0x7fff98907250 has 54 cards and mask 18014398509481983.
♥############# ♠############# ♣############# ♦############# ☻##..........
Hand 0x7fff98907210 has 0 cards and mask 0.
♥............. ♠............. ♣............. ♦............. ☻............
Deck 0x7fff98907250 has 50 cards and mask 18013848686551038.
♥.############ ♠.############ ♣.############ ♦.############ ☻##..........
Eight of Hearts
Seven of Clubs
Ten of Diamonds
Six of Diamonds
Nine of Hearts
Deck 0x7fff98907250 has 45 cards and mask 17714777228828286.
♥.######..#### ♠.############ ♣.#####.###### ♦.####.###.### ☻##..........
Hand 0x7fff98907210 has 5 cards and mask 299071457722752.
♥.......##.... ♠............. ♣......#...... ♦.....#...#... ☻............
Deck 0x7fff98907210:
Eight of Hearts
Seven of Clubs
Ten of Diamonds
Six of Diamonds
Nine of Hearts
Deck and Hand are disjoint.
OK OK OK OK OK
Ace of Invalid
20101123
Yet more stupid ideas
Amateur astronomer telescopes, live video streaming, combine the different observations into highres frames: live 24/7 HD webcams of the planets.
Speckle imaging
Compressed sensing
Image Deblurring with Blurred/Noisy Image Pairs
Speckle imaging
Compressed sensing
Image Deblurring with Blurred/Noisy Image Pairs
More stupid ideas
Take The European Citizens' Initiative, add a public forum, link it up with EU proceedings, build a Citizens' Parliament.
The ECI proposal says that you need a million signatories to send a citizens' initiative to the European Commission. The EC is the main proposalgenerating entity in the EU. The European Parliament, in contrast, is the proposalfiltering entity (now with new proposalgenerating powers granted to it by the Lisbon Treaty). In that sense, the ECI impacts the very top of the EU legislative system.
The million signatories must include 0.2% of the votingage citizens for a third of the member states to be valid. And after collecting 300 000 statements of support, you must submit it to the EC for an admissibility acid check where the EC figures out whether they can legally do it or not. After two months, you should get back a yay/nay. And you can collect the statements of support online as long as your mechanism is secure (further details pending).
After that, 3 months of checking the signatures for validity by the member states using random sampling. After that, 4 months of examination by the EC, at the end of which it "would then be required to set out its conclusions on the initiative and the action it intends to take in a communication, which would be notified to the organiser as well as to the European Parliament and Council and would be made public."
Not the most blisteringly fast method of democratic participation, is it? Still, it's very interesting.
I wonder how much governance is a problem of thinking power (in terms of "need more heads to think better solutions!") and how much a sensor network problem ("need to know what's wrong with the world to fix it!") and how much a problem of integrating the two into a working system. How would this kind of a wideranging massively large parliament work best?
A large and diverse population has experts from all the different walks of life. Which is probably where the thinking power of the citizens' parliament would best manifest. But you need to find the best proposals from the noise of mediocre proposals. Then you have the sensor network side, where a large population is again an asset. But again, there's a large amount of noise with a large population. So perhaps the important thing is making a good network to filter the signal.
The ECI proposal says that you need a million signatories to send a citizens' initiative to the European Commission. The EC is the main proposalgenerating entity in the EU. The European Parliament, in contrast, is the proposalfiltering entity (now with new proposalgenerating powers granted to it by the Lisbon Treaty). In that sense, the ECI impacts the very top of the EU legislative system.
The million signatories must include 0.2% of the votingage citizens for a third of the member states to be valid. And after collecting 300 000 statements of support, you must submit it to the EC for an admissibility acid check where the EC figures out whether they can legally do it or not. After two months, you should get back a yay/nay. And you can collect the statements of support online as long as your mechanism is secure (further details pending).
After that, 3 months of checking the signatures for validity by the member states using random sampling. After that, 4 months of examination by the EC, at the end of which it "would then be required to set out its conclusions on the initiative and the action it intends to take in a communication, which would be notified to the organiser as well as to the European Parliament and Council and would be made public."
Not the most blisteringly fast method of democratic participation, is it? Still, it's very interesting.
I wonder how much governance is a problem of thinking power (in terms of "need more heads to think better solutions!") and how much a sensor network problem ("need to know what's wrong with the world to fix it!") and how much a problem of integrating the two into a working system. How would this kind of a wideranging massively large parliament work best?
A large and diverse population has experts from all the different walks of life. Which is probably where the thinking power of the citizens' parliament would best manifest. But you need to find the best proposals from the noise of mediocre proposals. Then you have the sensor network side, where a large population is again an asset. But again, there's a large amount of noise with a large population. So perhaps the important thing is making a good network to filter the signal.
20101118
Stupid ideas
#1
Use toString, Hungarian notation and a JS parser to compile JS functions to GLSL. To run the JS shaders in JS, implement the GLSL builtins as a JS library and write a pureJS GL implementation to laugh at.
Maybe you could do an OpenCL implementation like that too.
#2
3D software should come with sculptfriendly prerigged model kits with UV maps (yes, having some trouble with the technical aspects of 3D modeling). Also, voxel sculpt looks amazing.
Use toString, Hungarian notation and a JS parser to compile JS functions to GLSL. To run the JS shaders in JS, implement the GLSL builtins as a JS library and write a pureJS GL implementation to laugh at.
// Pseudocode time!
var myShader = {
vertex: {
attributes: {v3Vertex: buffer, v2TexCoord: anotherBuffer},
uniforms: {m4PMatrix: cameraMatrix, m4MVMatrix: worldMatrix},
varyings: ['v2TexCoord0'],
// the hypothetical JS GLSL runtime would do something like
// shader.void_main.apply(GLSLContext, [])
void_main: function() {
var v4v = this.vec4(this.v3Vertex, 1.0);
this.v2TexCoord0 = this.v2TexCoord;
this.gl_Position = this.m4PMatrix.mulV4(this.m4MVMatrix.mulV4(v4v));
}
},
fragment: {
varyings: ['v2TexCoord0'],
uniforms: {s2DTexture0: myTexture, fRadius: 0.1},
void_main: function() {
var fd = this.fRadius / 5.0;
var v4Color = this.vec4(0.0);
for (var fi=0.0; fi<5.0; fi++) {
var v2tc = this.v2TexCoord0.add(this.vec2(fi*fd,fi*fd));
v4Color = v4Color.add(this.texture2D(this.s2DTexture0, v2tc));
}
this.gl_FragColor = v4Color;
}
}
};
puts(GLSL.convert(myShader.vertex));
=>
precision highp float;
attribute vec3 v3Vertex;
attribute vec2 v2TexCoord;
uniform mat4 m4PMatrix;
uniform mat4 m4MVMatrix;
varying vec2 v2TexCoord0;
void main() {
vec4 v4v = vec4(v3Vertex, 1.0);
v2TexCoord0 = v2TexCoord;
gl_Position = m4PMatrix * (m4MVMatrix * v4v);
}
puts(GLSL.convert(myShader.fragment));
=>
precision highp float;
varying vec2 v2TexCoord0;
uniform sampler2D s2DTexture0;
uniform float fRadius;
void main() {
float fd = fRadius / 5.0;
vec4 v4Color = vec4(0.0);
for (float fi=0.0; fi<5.0; fi++) {
vec2 v2tc = v2TexCoord0 + vec2(fi*fd, fi*fd);
v4Color = v4Color + texture2D(s2DTexture0, v2tc);
}
gl_FragColor = v4Color;
}
Maybe you could do an OpenCL implementation like that too.
#2
3D software should come with sculptfriendly prerigged model kits with UV maps (yes, having some trouble with the technical aspects of 3D modeling). Also, voxel sculpt looks amazing.
20101022
Storage visions
Some sort of userfriendly UI to ZFS+Plan9. One big main filesystem that spans several drives and automatically does striping, mirroring and versioning. Fast drives are used as cache, big drives as mirrors and history.
Plug in a new internal drive and usable space, redundancy and total performance grows by some percentage of the new drive's specs. Bring a new computer in the local area network and the filesystems on the computers are merged into a networkwide filesystem, adding usable space, redundancy and performance.
Pull out a drive and the file system rebalances itself over the remaining drives. If some files require the pulled out drive, display notification and freeze the files until the drive is restored. Use a disk utility to separate a drive from the filesystem into a new filesystem, the utility copies files unique to the drive to other drives and disconnects and formats the separated drive when done.
Computers on the local network are used as page cache and as mirrors of the filesystem. Cloud service is used as a backup mirror and remote sync. Treat the entire memory hierarchy and network as parts of the same single filesystem. When on the network, it doesn't matter which computer or which disk has the file, apart from performance.
Transient devices carry a small local cache of the important things on the filesystem, and access the rest of the filesystem over the network. Either by connecting to the home network or by going through the cloud service. The local cache of the filesystem prioritizes metadata over the actual data and access to the actual data can usually be done by streaming a lowbandwidth version.
*mirrorshades*
*techno soundtrack*
*sunrise*
Plug in a new internal drive and usable space, redundancy and total performance grows by some percentage of the new drive's specs. Bring a new computer in the local area network and the filesystems on the computers are merged into a networkwide filesystem, adding usable space, redundancy and performance.
Pull out a drive and the file system rebalances itself over the remaining drives. If some files require the pulled out drive, display notification and freeze the files until the drive is restored. Use a disk utility to separate a drive from the filesystem into a new filesystem, the utility copies files unique to the drive to other drives and disconnects and formats the separated drive when done.
Computers on the local network are used as page cache and as mirrors of the filesystem. Cloud service is used as a backup mirror and remote sync. Treat the entire memory hierarchy and network as parts of the same single filesystem. When on the network, it doesn't matter which computer or which disk has the file, apart from performance.
Transient devices carry a small local cache of the important things on the filesystem, and access the rest of the filesystem over the network. Either by connecting to the home network or by going through the cloud service. The local cache of the filesystem prioritizes metadata over the actual data and access to the actual data can usually be done by streaming a lowbandwidth version.
*mirrorshades*
*techno soundtrack*
*sunrise*
20101020
How many dwarfs live in Mt. Everest
Dwarfs, those earthdwelling little buggers, grand masters of volume. What if they burrowed the entire Mount Everest into a big dwarfwarren? Let's find out!
First, some assumptions.
Dwarfs are clearly good at cramped living as they live in mostly closedoff subterranean cities. They have also mastered the art of stonecraft to a magical level, so their buildings may well compare favorably to modern ferroconcrete towers. Let's assume that a dwarven city boasts a similar population density as urban Hong Kong, i.e. 250000 dwarfs per square kilometer with an average building height of 45 meters. A cubic kilometer of dwarvopolis would consist of 20 such layers and have a population of 5 million dwarfs.
Mount Everest is kinda pointy, but not all that pointy. Let's suppose that its slope is 45 degrees and that it approximates a cone. That would give it around 720 cubic kilometers in volume. But don't forget that dwarfs are subterranean, so we could reasonably assume that the dwarven city of Sagarmatha extends an extra kilometer under the surface with dwarven suburbia sprawling in a 10 kilometer radius. With the subterranean suburbs added, our dwarven city would have a total volume of around one thousand cubic kilometers.
From which a simple multiplication leads us to a total Sagarmatha dwarvopolitan population of 5 billion.
But why stop at Everest? Cast your net wider! The entire population of the Great Himalayan Dwarven Realm would be... the area of Himalayas is around 610000km^2, Tibetan plateau height is 4.5km, assume 0.5% of Everest population density for the sticks, i.e. 25000 dwarfs per cubic kilometer, and put it all together for a grand total of 69 billion Himalayan dwarfs.
First, some assumptions.
Dwarfs are clearly good at cramped living as they live in mostly closedoff subterranean cities. They have also mastered the art of stonecraft to a magical level, so their buildings may well compare favorably to modern ferroconcrete towers. Let's assume that a dwarven city boasts a similar population density as urban Hong Kong, i.e. 250000 dwarfs per square kilometer with an average building height of 45 meters. A cubic kilometer of dwarvopolis would consist of 20 such layers and have a population of 5 million dwarfs.
Mount Everest is kinda pointy, but not all that pointy. Let's suppose that its slope is 45 degrees and that it approximates a cone. That would give it around 720 cubic kilometers in volume. But don't forget that dwarfs are subterranean, so we could reasonably assume that the dwarven city of Sagarmatha extends an extra kilometer under the surface with dwarven suburbia sprawling in a 10 kilometer radius. With the subterranean suburbs added, our dwarven city would have a total volume of around one thousand cubic kilometers.
From which a simple multiplication leads us to a total Sagarmatha dwarvopolitan population of 5 billion.
But why stop at Everest? Cast your net wider! The entire population of the Great Himalayan Dwarven Realm would be... the area of Himalayas is around 610000km^2, Tibetan plateau height is 4.5km, assume 0.5% of Everest population density for the sticks, i.e. 25000 dwarfs per cubic kilometer, and put it all together for a grand total of 69 billion Himalayan dwarfs.
Ubuntu 10.10 nits
Nice fonts.
Nice boot screen.
Nice boot screen doesn't work with fglrx.
Dark theme bg & window style screws up labels with dark text.
Dark title bar & dark menu bar occasionally look silly with the lightgrey window bg. Especially Firefox looks bad. The thing with dark bg themes is that the amount of apps designed to work properly with one is damn close to zero. (Hence, I suppose, the lightgrey window bg.)
The active window titlebar gradient looks kinda sloppy. Ditto for popup menu gradients.
Flat window buttons walk a tightrope between sucking and being just boring.
GTK widget theme buttons are very nice.
Dropdown widget has the triangle vertically offcenter by 1px?
The salmon active color in the widget theme is not very nice.
Scrollbar widget looks different from the rest of the theme and a bit weird. Exacerbated by rounded buttons but no alpha channel, so Firefox textarea scrollbar buttons look icky.
New Rhythmbox icon is nice.
Rhythmbox itself is still nowhere near Amarok 1.4.
Amarok 2.0 is still nowhere near Rhythmbox.
Gnome Terminal icon is vertically offcenter in top panel.
Notification area in top panel looks nice with the monochrome theme.
App icons in top panel are too tall and look bad.
Weather applet rainy icon has a black border. Other widget icons (other weather icons too) don't have borders, making the rainy icon look outofplace.
The speech bubble icon in top panel is one pixel too low (more like one pixel too high).
The networking applet icon is superconfusing. I keep looking at it and wondering "Wtf was that icon for again? Check updates? No.. oh right, networking."
The (very lowvisibility) round corners in panel dropdown menus look nice. But suffer from lack of text padding.
Window title color is actually orange instead of dark grey (try Chromium with GTK theme).
1px wide window resize handles make me cry.
The purple color works surprisingly well.
Default desktop bg is nice.
Popup text labels have too little horizontal padding. Otherwise they look nice.
Date & time applet text has too little horizontal padding as well.
The panel applets have crap spacing so you need to manually move them around by a couple pixels to make the layout look less horrible.
PulseAudio is glitchy and the volume control does not work (you can only set it between mute, medium volume and superloud volume, no low volume setting available).
The new fancy volume control doubling as a music player remote is nice.
udev screws up md RAID arrays by being clueless (starts only one disk of a RAID1 > stops boot process thx to unable mount disk & dirty array after reassembly > need resync > FFFFUUUUUUU. How about using mdadm A scan like sane people do?) [edit: mdadm A scan wants to have an md device name like /dev/md0 instead of /dev/md/MyRAID, I dunno if that sunk udev or if it's just doing silly things.]
Nautilus is slow so I very much try not to manage files with it (instead terminal for quick & programmable things, Filezoo for overviews and visual sorting)
The CPU speed indicator applet could show all the CPUs at once instead of requiring running several copies of it.
I don't get what the Social Networks thing does. It doesn't seem to do anything. Maybe that's what it does.
What is Ubuntu One? Perhaps these things should be explained as one starts them up?
Why is this Gwibber thing so slow? Or perhaps it just lacks a progress indicator, making it feel slow.
Evolution feels like a really fiddly and slow interface to Gmail.
I do really like it for the most part, at least after a year in Debian's permabroken Gnomeland. I'd use FVWM or something cool and edgy and configurable like that for window management but compiz no worky on nonretarded WMs and theming FVWM (while infinitely easier than theming metacity) is limited and painful. Not that I've done any theming for a couple years. Plus you don't have a nice GUI to make F1 start a new terminal (just a nice thousandline config file).
Maybe there should be a WebKitbased desktop environment.
Nice boot screen.
Nice boot screen doesn't work with fglrx.
Dark theme bg & window style screws up labels with dark text.
Dark title bar & dark menu bar occasionally look silly with the lightgrey window bg. Especially Firefox looks bad. The thing with dark bg themes is that the amount of apps designed to work properly with one is damn close to zero. (Hence, I suppose, the lightgrey window bg.)
The active window titlebar gradient looks kinda sloppy. Ditto for popup menu gradients.
Flat window buttons walk a tightrope between sucking and being just boring.
GTK widget theme buttons are very nice.
Dropdown widget has the triangle vertically offcenter by 1px?
The salmon active color in the widget theme is not very nice.
Scrollbar widget looks different from the rest of the theme and a bit weird. Exacerbated by rounded buttons but no alpha channel, so Firefox textarea scrollbar buttons look icky.
New Rhythmbox icon is nice.
Rhythmbox itself is still nowhere near Amarok 1.4.
Amarok 2.0 is still nowhere near Rhythmbox.
Gnome Terminal icon is vertically offcenter in top panel.
Notification area in top panel looks nice with the monochrome theme.
App icons in top panel are too tall and look bad.
Weather applet rainy icon has a black border. Other widget icons (other weather icons too) don't have borders, making the rainy icon look outofplace.
The speech bubble icon in top panel is one pixel too low (more like one pixel too high).
The networking applet icon is superconfusing. I keep looking at it and wondering "Wtf was that icon for again? Check updates? No.. oh right, networking."
The (very lowvisibility) round corners in panel dropdown menus look nice. But suffer from lack of text padding.
Window title color is actually orange instead of dark grey (try Chromium with GTK theme).
1px wide window resize handles make me cry.
The purple color works surprisingly well.
Default desktop bg is nice.
Popup text labels have too little horizontal padding. Otherwise they look nice.
Date & time applet text has too little horizontal padding as well.
The panel applets have crap spacing so you need to manually move them around by a couple pixels to make the layout look less horrible.
PulseAudio is glitchy and the volume control does not work (you can only set it between mute, medium volume and superloud volume, no low volume setting available).
The new fancy volume control doubling as a music player remote is nice.
udev screws up md RAID arrays by being clueless (starts only one disk of a RAID1 > stops boot process thx to unable mount disk & dirty array after reassembly > need resync > FFFFUUUUUUU. How about using mdadm A scan like sane people do?) [edit: mdadm A scan wants to have an md device name like /dev/md0 instead of /dev/md/MyRAID, I dunno if that sunk udev or if it's just doing silly things.]
Nautilus is slow so I very much try not to manage files with it (instead terminal for quick & programmable things, Filezoo for overviews and visual sorting)
The CPU speed indicator applet could show all the CPUs at once instead of requiring running several copies of it.
I don't get what the Social Networks thing does. It doesn't seem to do anything. Maybe that's what it does.
What is Ubuntu One? Perhaps these things should be explained as one starts them up?
Why is this Gwibber thing so slow? Or perhaps it just lacks a progress indicator, making it feel slow.
Evolution feels like a really fiddly and slow interface to Gmail.
I do really like it for the most part, at least after a year in Debian's permabroken Gnomeland. I'd use FVWM or something cool and edgy and configurable like that for window management but compiz no worky on nonretarded WMs and theming FVWM (while infinitely easier than theming metacity) is limited and painful. Not that I've done any theming for a couple years. Plus you don't have a nice GUI to make F1 start a new terminal (just a nice thousandline config file).
Maybe there should be a WebKitbased desktop environment.
20101018
Fast SSD is fast
The PCIE SSD I bought is a 120GB OCZ RevoDrive. Which is basically two 60 GB OCZ Vertex 2 SSDs mounted on a PCIE fakeRAID controller.
Works nicely in Ubuntu 10.10, though apparently less nicely on earlier versions (problems booting from fakeRAID).
The amusing thing about the SSD is that the controller is compressing all the data it writes to the drive. So you get 540MB/s speeds for reading a file full of zeroes but just 300MB/s for reading a file full of MP3. Write speeds are 475MB/s for zeroes, 300MB/s for MP3s.
A random access 4kiB read takes around 0.26 ms, for a 16MB/s random read speed. But you can do around 8 random reads in parallel, which gets you a 117MB/s random read speed. My 7200 RPM disks can do around 75MB/s with a streaming read, so 117MB/s random access speed is absolutely nuts.
Parallel reads are kinda hard to program though. You could spawn a bunch of threads and do a gather into a shared array I suppose, though that feels a bit heavy (hah, right. I expect the threading overhead to be negligible.) Do the reads from a shared mmap of the file, each thread reading its own segments off it? [edit: no, mmap apparently serializes its page faults or OpenMP doesn't run the threads in parallel]
Works nicely in Ubuntu 10.10, though apparently less nicely on earlier versions (problems booting from fakeRAID).
The amusing thing about the SSD is that the controller is compressing all the data it writes to the drive. So you get 540MB/s speeds for reading a file full of zeroes but just 300MB/s for reading a file full of MP3. Write speeds are 475MB/s for zeroes, 300MB/s for MP3s.
A random access 4kiB read takes around 0.26 ms, for a 16MB/s random read speed. But you can do around 8 random reads in parallel, which gets you a 117MB/s random read speed. My 7200 RPM disks can do around 75MB/s with a streaming read, so 117MB/s random access speed is absolutely nuts.
Parallel reads are kinda hard to program though. You could spawn a bunch of threads and do a gather into a shared array I suppose, though that feels a bit heavy (hah, right. I expect the threading overhead to be negligible.) Do the reads from a shared mmap of the file, each thread reading its own segments off it? [edit: no, mmap apparently serializes its page faults or OpenMP doesn't run the threads in parallel]
20101016
Little benchmark
Got my computer put together yesterday, and now's the time to benchmark it! On a Saturday afternoon...
Image correlation algorithm benchmark:
Pretty much linear scaling with clock frequency in OpenCL. Both have a 3 cycle L1 latency and the algorithm is very much an L1 cache benchmark, so this isn't too surprising. The SSE version has some bandwidth / loadbalancing bottleneck going on, and the naive version is pretty much a pure memory bandwidth benchmark.
Image correlation algorithm benchmark:
OpenCL
240 GBps  Athlon II X4 640, 3GHz (12GHz aggregate), 2MB L2
85 GBps  Core 2 Duo E6400, 2.1GHz (4.3GHz aggregate), 2MB L2
OpenMP+SSE optimized
103 GBps  Athlon II X4
45 GBps  Core 2 Duo
OpenMP+SSE naive
13 GBps  Athlon II X4
5 GBps  Core 2 Duo
Pretty much linear scaling with clock frequency in OpenCL. Both have a 3 cycle L1 latency and the algorithm is very much an L1 cache benchmark, so this isn't too surprising. The SSE version has some bandwidth / loadbalancing bottleneck going on, and the naive version is pretty much a pure memory bandwidth benchmark.
20101013
New hardware!
In our cheap (and notsocheap) upgrade category, got me an Athlon II X4 + mobo + RAM (300€) and a PCIE SSD (300€). Which should get me roughly 3x more CPU power, double the memory bandwidth, 5x the IO bandwidth and something like 700x random access performance. Now just have to piece it together and benchmark!
20101010
WebGL presentation editor prototype
Try it out here, requires a WebGL capable browser. It's a bit buggy, so watch your head.
You can check out the sources on GitHub. What's still missing is
20101004
Desk drawer coding
Tally of different sorts of crap I've done to pass the time:
Music players written: 4 (with mplayer / <audio> / Flash for playback, so maybe these are more "playlist players").
Image slideshows written: 5, three of which on arbitrarily long lists.
Video players /w playlist: 1 (I just repurposed one slideshow).
Document readers: 3, though they just show images of pages.
Scene graph libraries for different retarded purposes: 7.
GUI toolkits: 1. But it didn't go too well.
LaTeXstyle text justifier: 1.
Layout text inside a polygon: 1. And it seems like there's zero literature on text layout algorithms.
Tilemap zoomers: 3.
Infinite tree hierarchy zoomers: 1.
File managers: 1. And 3 webbased file managers.
File thumbnailer libraries: 2.
Thumbnail cache systems: 3.
OpenGLbased presentation apps: 3.
File annotation systems: 1.
Metadata editing forms: 3.
File metadata extraction libraries: 1.
Standard library rewrites: 1.
Array combinator libraries: 2. Well, maybe 3.
Search query parsers: 2, the second time Doing It Right with a parser generator.
Search query matchers: 1.
DOM helper libraries: 2.
Contextsensitive help in REPL: 1.
Distributed messagepassing libraries: 2.
Network service discovery libraries: 1.
Media players with networked storage and playback (controller on mobile, storage across the network, playback on HTPC): 1, but durr.
Converter graph with pathfinder to convert files: 1 (IIRC plugged to the media player too, so it played PDFs with texttospeech).
Animation libraries with tweens and timelines: 1.
SQL databases (tables and indices, not database engines): 5? Not very much.
Memcached to cache results of expensive queries: 1.
HTML fragment caches: 1.
Tar parsers: 1.
3D model parsers: 3 of varying levels of completeness.
Huffman encoder & decoder: 1 (yay, coursework).
Image file formats: One vector drawing hack, plus three that basically concatenate JPEGs into a bigger file.
Toy drawing programs: 3.
Games: 1.
Editors: 0.
Compilers: 0.
Desk calculators: 0.
Programming languages: 0 (unless you count an OO Brainfuck implementation).
Something useful: 0.
Hmm, that was more than I initially thought :<
Music players written: 4 (with mplayer / <audio> / Flash for playback, so maybe these are more "playlist players").
Image slideshows written: 5, three of which on arbitrarily long lists.
Video players /w playlist: 1 (I just repurposed one slideshow).
Document readers: 3, though they just show images of pages.
Scene graph libraries for different retarded purposes: 7.
GUI toolkits: 1. But it didn't go too well.
LaTeXstyle text justifier: 1.
Layout text inside a polygon: 1. And it seems like there's zero literature on text layout algorithms.
Tilemap zoomers: 3.
Infinite tree hierarchy zoomers: 1.
File managers: 1. And 3 webbased file managers.
File thumbnailer libraries: 2.
Thumbnail cache systems: 3.
OpenGLbased presentation apps: 3.
File annotation systems: 1.
Metadata editing forms: 3.
File metadata extraction libraries: 1.
Standard library rewrites: 1.
Array combinator libraries: 2. Well, maybe 3.
Search query parsers: 2, the second time Doing It Right with a parser generator.
Search query matchers: 1.
DOM helper libraries: 2.
Contextsensitive help in REPL: 1.
Distributed messagepassing libraries: 2.
Network service discovery libraries: 1.
Media players with networked storage and playback (controller on mobile, storage across the network, playback on HTPC): 1, but durr.
Converter graph with pathfinder to convert files: 1 (IIRC plugged to the media player too, so it played PDFs with texttospeech).
Animation libraries with tweens and timelines: 1.
SQL databases (tables and indices, not database engines): 5? Not very much.
Memcached to cache results of expensive queries: 1.
HTML fragment caches: 1.
Tar parsers: 1.
3D model parsers: 3 of varying levels of completeness.
Huffman encoder & decoder: 1 (yay, coursework).
Image file formats: One vector drawing hack, plus three that basically concatenate JPEGs into a bigger file.
Toy drawing programs: 3.
Games: 1.
Editors: 0.
Compilers: 0.
Desk calculators: 0.
Programming languages: 0 (unless you count an OO Brainfuck implementation).
Something useful: 0.
Hmm, that was more than I initially thought :<
20100927
Visitor stats for September 2010
Browser market share
43% Firefox
31% Chrome
10% Safari
10% Explorer
4% Opera
OS market share
56% Windows
22% Linux
19% OS X
1.6% iOS
0.7% Android
More than 90% of visitors had a screen width of 1280 or higher.
94% had Flash, 1% had a Flash version other than 10. 85% had Java.
Most search engine visitors wanted to know about SSE matrix multiplication. Clearly a hot topic.
20100925
Composing shaders
I haven't found a good way to do modular GLSL shaders, so here's some blather on that subject.
Shaders themselves are strings of source code, compiled to a binary by the OpenGL drivers. A shader has a
A shader also has a type, it can either be a vertex shader or a fragment shader (or a geometry shader). The difference between the two is that a vertex shader determines what areas to consider for modification, whereas a fragment shader determines the output value for each element in the modified areas. Imagine you're drawing a filled circle: the vertex shader would output the bounding box of the circle and the fragment shader would test each element in the bbox to see if it lies inside the circle, setting them to the fill color if they do.
But back to the composing part. If you have some common helper functions that you'd like to use in your different shaders (e.g. defaultTransform(), blend functions), how do you reuse them? I tried to link different shader objects together but it didn't work for whatever the reason. So now I'm just concatenating strings.
Maybe you want to further split your shaders into functional segments and turn those segments on and off for different uses. One way is to put some #ifdefs in the shader and prepend a block of #defines to your source.
Another way would be to affix an identifier to the source strings and only use a string if it's enabled. This is like writing a featurepoor preprocessor by yourself.
Or some mix of the above. I don't really know what's the superbest way to do this.
Shaders themselves are strings of source code, compiled to a binary by the OpenGL drivers. A shader has a
void main()
that is called to execute the shader. There are also global definitions in the form of attributes, varyings and uniforms. And a precision qualifier like precision highp float;
, and C preprocessor macros.A shader also has a type, it can either be a vertex shader or a fragment shader (or a geometry shader). The difference between the two is that a vertex shader determines what areas to consider for modification, whereas a fragment shader determines the output value for each element in the modified areas. Imagine you're drawing a filled circle: the vertex shader would output the bounding box of the circle and the fragment shader would test each element in the bbox to see if it lies inside the circle, setting them to the fill color if they do.
But back to the composing part. If you have some common helper functions that you'd like to use in your different shaders (e.g. defaultTransform(), blend functions), how do you reuse them? I tried to link different shader objects together but it didn't work for whatever the reason. So now I'm just concatenating strings.
ShaderSource = Klass({
initialize: function(type, source) {
this.type = type;
this.text = toArray(arguments).slice(1).map(function(s) {
return s.text ? s.text : s;
}).join("\n");
}
});
Maybe you want to further split your shaders into functional segments and turn those segments on and off for different uses. One way is to put some #ifdefs in the shader and prepend a block of #defines to your source.
ShaderSource = Klass({
initialize: function(type, source) {
this.type = type;
this.text = this.origText = toArray(arguments).slice(1).map(function(s) {
return s.text ? s.text : s;
}).join("\n");
},
setDefines : function(defines) {
this.defines = defines;
var defs = [];
for (var i in defines) {
defs.push("#define "+i+" "+defines[i]);
}
defs.push(this.origText);
this.text = defs.join("\n");
}
});
Another way would be to affix an identifier to the source strings and only use a string if it's enabled. This is like writing a featurepoor preprocessor by yourself.
ShaderSource = Klass({
initialize: function(type, source) {
this.type = type;
this.segments = toArray(arguments).slice(1);
this.text = this.segments.map(function(s) {
return s.text ? s.text : s;
}).join("\n");
},
setEnabled : function(enabled) {
this.enabled = enabled;
this.text = this.segments.map(function(s) {
if (!s.id  enabled[s.id])
return s.text ? s.text : s;
else
return "";
}).join("\n");
},
setDisabled : function(disabled) {
this.disabled = disabled;
this.text = this.segments.map(function(s) {
if (!s.id  !disabled[s.id])
return s.text ? s.text : s;
else
return "";
}).join("\n");
}
});
Or some mix of the above. I don't really know what's the superbest way to do this.
Easy inheritance in JavaScript
JavaScript inheritance is cumbersome. You have a constructor function and it has a prototype. And inheritance is done by setting the prototype to a newly created parent object? What? That means that you better not do anything in the constructor, otherwise you'll have a royal pain deriving new prototypes from it.
So I did what I do and wrote my own inheritance system, based on mixins. It still functions pretty much like the JavaScript system, but is more convenient for building the constructor and the prototype from existing classes.
Here's the infamous OO geometry demo:
Klass is a function that builds a constructor. The returned constructor calls its prototype's initialize method.
The constructor's prototype is built by merging together all the arguments given in the Klass call. If an argument has a prototype property defined, the prototype is used for merging. Otherwise the argument is merged as itself.
The built prototype is then finally merged into the constructor function, so that you have access to the prototype functions and properties from the constructor object itself. Which is useful for doing calls to other classes, as you can do e.g. Parent.someMethod.call(this) instead of Parent.prototype.someMethod.call(this).
Object.extend merges the src object's attributes with the dst object, ignoring errors. Ignoring the errors is useful in merging native objects (e.g. styles) as they've got some stuff that can't be overridden.
toArray creates a new array from an object that has a length property. Useful for dealing with all the silly Arraylike objects such as the function arguments object and the lists returned by the DOM API.
So I did what I do and wrote my own inheritance system, based on mixins. It still functions pretty much like the JavaScript system, but is more convenient for building the constructor and the prototype from existing classes.
Here's the infamous OO geometry demo:
Rectangle = Klass({
initialize : function(w, h) {
this.w = w;
this.h = h;
},
getArea : function() {
return this.w * this.h;
}
})
Square = Klass(Rectangle, {
initialize : function(s) {
Rectangle.initialize.call(this, s, s);
}
})
Moving = Klass({
mass : 1,
initialize : function(n) {
this.s = new Array(n);
this.a = new Array(n);
this.v = new Array(n);
this.f = new Array(n);
for (var i=0; i<n; i++){
this.s[i] = this.a[i] = this.v[i] = this.f[i] = 0;
}
},
move : function(t) {
for (var i=0; i<this.s.length; i++) {
var a = this.a[i], s = this.s[i], v = this.v[i], f = this.f[i];
var newA = f/this.mass;
var newV = v + a*t + (1/2)*(newAa)*t*t;
var newS = s + v*t + (1/2)*a*t*t + (1/3)*(newAa)*t*t*t;
this.a[i] = newA;
this.v[i] = newV;
this.s[i] = newS;
}
this.previousMass = this.mass;
}
})
Printable = {
valueString : function() {
var a = [];
for (var i in this)
if (typeof this[i] != 'function') a.push(i);
a.sort();
for (var i=0; i<a.length; i++)
a[i] = a[i] + ": " + this[a[i]];
return a.join(", ");
},
print : function() {
console.log(this.valueString());
}
}
MovingRectangle = Klass(Rectangle, Moving, Printable, {
density : 0.19,
height : 1,
initialize : function(w,h,d) {
Rectangle.initialize.call(this,w,h);
Moving.initialize.call(this,2);
if (d)
this.density = d;
this.mass = this.getVolume() * this.density;
},
getVolume : function() {
return this.getArea() * this.height;
}
})
Klass is a function that builds a constructor. The returned constructor calls its prototype's initialize method.
The constructor's prototype is built by merging together all the arguments given in the Klass call. If an argument has a prototype property defined, the prototype is used for merging. Otherwise the argument is merged as itself.
The built prototype is then finally merged into the constructor function, so that you have access to the prototype functions and properties from the constructor object itself. Which is useful for doing calls to other classes, as you can do e.g. Parent.someMethod.call(this) instead of Parent.prototype.someMethod.call(this).
Klass = function() {
var c = function() {
this.initialize.apply(this, arguments);
}
c.ancestors = toArray(arguments);
c.prototype = {};
for(var i = 0; i<arguments.length; i++) {
var a = arguments[i];
if (a.prototype) {
Object.extend(c.prototype, a.prototype);
} else {
Object.extend(c.prototype, a);
}
}
Object.extend(c, c.prototype);
return c;
}
Object.extend merges the src object's attributes with the dst object, ignoring errors. Ignoring the errors is useful in merging native objects (e.g. styles) as they've got some stuff that can't be overridden.
Object.extend = function(dst, src) {
for (var i in src) {
try{ dst[i] = src[i]; } catch(e) {}
}
return dst;
}
toArray creates a new array from an object that has a length property. Useful for dealing with all the silly Arraylike objects such as the function arguments object and the lists returned by the DOM API.
toArray = function(obj) {
var a = new Array(obj.length);
for (var i=0; i<obj.length; i++)
a[i] = obj[i];
return a;
}
20100922
WebGL slides translated
I translated my WebGL slides to English and put them on GitHub Pages. Which is great, btw. Upgraded my account to Micro just to give them props for that.
If you would like to buy an app to make fancy WebGL slideshows, send me a mail and I'll get back to you.
If you'd like to subscribe to a webbased zoomable file management & sync service instead, send me a mail and I'll get back to you.
In other news, I'm back on Twitter.
If you would like to buy an app to make fancy WebGL slideshows, send me a mail and I'll get back to you.
If you'd like to subscribe to a webbased zoomable file management & sync service instead, send me a mail and I'll get back to you.
In other news, I'm back on Twitter.
20100917
Saturday Night Links
ARTtech 2010: Fairlight's rendering secrets from AssemblyTV. The render graph approach is interesting. And the demo editor.
Yay! SVG in IMG coming in Firefox. Should enable fancy scalable UI images and easy SVG textures in WebGL. (Edit: In 20100918 nightly SVG in IMG works but can't be drawn to canvas or uploaded as WebGL texture.)
Creature workflow in Mudbox 2011.
Autoretopology in 3D Coat. Take a highpoly model, paint over areas that you want at high resolution, draw lines to suggest edgeloop direction, done!
Maybe one day 3D modeling won't require thinking about polygons. You'd just have a highresolution form that you deform and then save a lowres version for interactive use. I dunno. The whole "create box, extrude, cut, unwrap, texture"flow turns me off. It's like you're doing reverse origami: you start with a paper box, then you extrude it, bend it, cut bits off, glue bits back on, cut it open and splay it flat, draw a picture on it, put it back together, stuff a wire model inside, animate. And all the time you're worrying if you got the bends just right so that the joints don't look stupid.
20100916
Frontend Finland WebGL slides
I gave a talk at the Frontend Finland meetup about WebGL. Had a great time, thanks to the organizers! My presentation slides are available here in glorious Finnish. Best experienced with a browser that has a working WebGL implementation (fear not, there's a fallback to plain unadorned HTML). Writing the slideshow app (and the slides) took about three days, with four days of preliminary library work (and the app dev fed into the library too).
Here are some links on the topics covered:
 WebGL
 Typed Arrays may also make JS faster
 Learning WebGL
 GLGE
 SpiderGL
 O3D
 Google's WebGL samples
 Planet WebGL
 Shader Toy
 ChemDoodle
 Rendering Quake 3 maps with WebGL
 Quake2 GWT port
 ANGLE
 Firefox 4 Audio Data API demo slides (video)
 W3C Audio Incubator Group
 Audio Data API Docs
 Photoshop math with GLSL shaders
The git repo for the utility library and the presentation can be found here.
The SVG presentation by Matti Paksula was really nifty as well. After the show, we were brainstorming about some kind of a super slideshow where you'd write the slides in HTML (with a nice CSS), then they'd get rendered using SVG / Canvas / WebGL depending on the capabilities of your browser. (What my presentation is doing is parsing the DOM into scene graph nodes, then rendering them using WebGL. Doing a similar system that rendered using SVG wouldn't be too difficult.)
Preserving the clickability of links and other such webness would be problematic with WebGL, but a mix of SVG, CSS 3D and WebGL might get you the best of both worlds: overlay SVG elems on top of the WebGL canvas, change the CSS 3D transform of the elems to match the WebGL scene graph. That'd be a lot like CAKE's mixed HTML/Canvas, except in 3D. If you want fancy FX on your SVG elems while preserving interactivity, you'd render them to textures and put the original element at opacity 0.01 on top of the fancy render, maybe? Or do shaders for SVG?
20100911
20100910
20100909
20100908
Arithmetic
You have things, some things. Call these onethings. There is also a nothing, call it 0. When you add onething to 0, we call that 1. Add a onething to 1 and you get 2. Add onething to 2 and you get 3. In that way we build a small list of numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
When you add onething to 9, we don't have a number for that here. But let's call it a tenthing and mark it on left of the onethings. We have 1 tenthing and 0 onethings, written in short as 10. Then we can add a onething to the 0 onethings to get 11. When we add a onething to 19, we get two tenthings: 20. When we have ten tenthings, we call that a hundredthing and write it as 100: one hundredthing, zero tenthings, zero onethings.
In a similar fashion, we could have a shorter list of numbers, say, 0, 1, 2, 3, 4. Now when you add onething to 4, you get one fivething and zero onethings, which we can write as 10_{5}. When we have five fivethings, we call that a twentyfivething and write it as 100_{5}.
There are also negative things. When a negative thing is added to a positive thing, they both disappear. Adding one negative thing to one positive thing makes both disappear, leaving you with zero things. If you add 5 negative things to 8 positive things, the 5 negative things and 5 positive things disappear, leaving 3 positive things. Adding 2 negative things to 3 more negative things gets you 5 negative things. Adding 4 negative things to 2 positive things makes the 2 positive things and 2 negative things disappear, leaving 2 negative things.
When you add 5 negative onethings to 2 positive tenthings, you need to crack open one of the tenthings to get ten positive onethings. The 5 negative things and 5 positive onethings disappear, leaving you with 1 positive tenthing and 5 positive onethings: 15.
To add 5 negative onethings to 1 positive hundredthing, you first crack open the hundredthing to get ten tenthings, then crack open one of the tenthings to get ten onethings. The 5 negative onethings and 5 positive onethings disappear, leaving you with 9 positive tenthings and 5 positive onethings: 95.
Adding a zero to a number results in the number itself. Adding 5 to 0 results in 5. Adding 0 to 5 results in 5.
Negating a number turns a positive number into a negative number and a negative number into a positive number. Positive 2 negated is negative 2. Negative 2 negated is positive 2. Adding a number to its negation is zero. Positive 2 added to negative 2 is zero.
We write positive numbers with a + in front of them and negative numbers with a  in front of them. Positive 2 is +2. Negative 2 is 2. We write addition with a + between the numbers. The + symbol is called plus. Negative 3 plus positive 2 is 3 + +2. We write negate as (number). Positive 2 negated is (+2). Negative 2 negated is (2).
Two numbers are equal if they are the same. Equality is written with a =. 2 equals 2 is written 2 = 2 and amounts to true. If the two numbers are not the same, the equality operation returns false. If both sides of the equality operation are identical, the equality operation returns true. 2 = 2 is true. 2 = 3 is false. 1 + 1 = 2 is undecided, rewriting it as 2 = 2 or 1 + 1 = 1 + 1 makes it true.
Subtraction is an operation where the first number is added to the negation of the second number. Subtraction is written with a  between the numbers. The  symbol is called minus. 3 minus +2 is written 3  +2 and means 3 + (+2).
Multiplication is an operation where each of the things in the first number is replaced by the second number. Multiplying +3 by 2 replaces each of the 3 things with a 2, giving you +(2) + +(2) + +(2) = 6. Multiplying +15 by +3 requires splitting open the tenthings to replace each of the onethings inside them with a +3, giving you +15 +3things, which amount to +45 when added together. Multiplying 2 by 5 replaces the two things with 5things, giving you (5) + (5), which simplifies to +5 + +5 and amounts to +10. Multiplying +0 by 2 gives you zero 2things, which amounts to zero. Multiplying 2 by +0 replaces the two things with +0, giving you (+0) + (+0) = 0 + 0 = 0.
Fractional things are parts of onethings, much like how onethings are parts of tenthings. Fractional things are defined by how many parts are needed to make up a onething. When you need two parts for onething, the fractional thing is called a half. When you need three parts, the fractional thing is onethird. With five parts needed, it's a onefifth. Five onefifths added together amount to one. Similarly three onethirds amount to one. Fractional numbers are written as 1/parts. Onefifth is written +1/5. Onethird is written as +1/3. Negative onesixth is written 1/6.
Adding 1/6 to 1/6 results in 2/6. Adding 1/6 to 5/6 results in 6/6, which equals 1. Adding 1/6 to 1 is 1/6 + 6/6 = 7/6. Adding 1/5 to 4/5 results in 5/5, which also equals 1. Adding 1/5 to 1 is 1/5 + 5/5 = 6/5.
Multiplication is written with an x between the numbers. The x is called times. 3 times 5 is written 3 x 5 and means 3 multiplied by 5, amounting to 15.
Multiplying a number by one results in the number itself. 5 x 1 is 5. 1 x 5 is 5.
The inverse of a number multiplied with the number itself results in one. The inverse of a number is the fraction with number parts. Zero does not have an inverse. The inverse of +5 is +1/5. The inverse of 3 is 1/3.
Three times the inverse of three is 3 x 1/3 = 1/3 + 1/3 + 1/3 = 3/3 = 1.
Negative two times the inverse of negative two is 2 x 1/2 = (1/2) + (1/2) = 1/2 + 1/2 = 2/2 = 1.
Division is an operation where you multiply the first number with the inverse of the second number. 3 divided by 5 is 3 x 1/5 and amounts to 3/5. 8 divided by 2 is 8 x 1/2 and amounts to 8/2, which adds up to 4.
Division is written with a / between the numbers. 5 divided by 3 is written 5 / 3 and means 5 x 1/3, and amounts to 5/3, or 1 + 2/3.
The allowed moves when manipulating an equation with additions are:
For an equation with multiplications, the above moves apply as well, but there is an additional move when dealing with a mix of additions and multiplications:
These are the only allowed structural moves for manipulating equations of additions and multiplications.
Additions can use the following equalities for simplifying expressions: a + 0 = a and a + a = 0. Multiplications can use a x 1 = a and a x 1/a = 1.
We can build more moves from the above, for example:
Now go forth and solve the problems of algebra!
When you add onething to 9, we don't have a number for that here. But let's call it a tenthing and mark it on left of the onethings. We have 1 tenthing and 0 onethings, written in short as 10. Then we can add a onething to the 0 onethings to get 11. When we add a onething to 19, we get two tenthings: 20. When we have ten tenthings, we call that a hundredthing and write it as 100: one hundredthing, zero tenthings, zero onethings.
In a similar fashion, we could have a shorter list of numbers, say, 0, 1, 2, 3, 4. Now when you add onething to 4, you get one fivething and zero onethings, which we can write as 10_{5}. When we have five fivethings, we call that a twentyfivething and write it as 100_{5}.
There are also negative things. When a negative thing is added to a positive thing, they both disappear. Adding one negative thing to one positive thing makes both disappear, leaving you with zero things. If you add 5 negative things to 8 positive things, the 5 negative things and 5 positive things disappear, leaving 3 positive things. Adding 2 negative things to 3 more negative things gets you 5 negative things. Adding 4 negative things to 2 positive things makes the 2 positive things and 2 negative things disappear, leaving 2 negative things.
When you add 5 negative onethings to 2 positive tenthings, you need to crack open one of the tenthings to get ten positive onethings. The 5 negative things and 5 positive onethings disappear, leaving you with 1 positive tenthing and 5 positive onethings: 15.
To add 5 negative onethings to 1 positive hundredthing, you first crack open the hundredthing to get ten tenthings, then crack open one of the tenthings to get ten onethings. The 5 negative onethings and 5 positive onethings disappear, leaving you with 9 positive tenthings and 5 positive onethings: 95.
Adding a zero to a number results in the number itself. Adding 5 to 0 results in 5. Adding 0 to 5 results in 5.
Negating a number turns a positive number into a negative number and a negative number into a positive number. Positive 2 negated is negative 2. Negative 2 negated is positive 2. Adding a number to its negation is zero. Positive 2 added to negative 2 is zero.
We write positive numbers with a + in front of them and negative numbers with a  in front of them. Positive 2 is +2. Negative 2 is 2. We write addition with a + between the numbers. The + symbol is called plus. Negative 3 plus positive 2 is 3 + +2. We write negate as (number). Positive 2 negated is (+2). Negative 2 negated is (2).
Two numbers are equal if they are the same. Equality is written with a =. 2 equals 2 is written 2 = 2 and amounts to true. If the two numbers are not the same, the equality operation returns false. If both sides of the equality operation are identical, the equality operation returns true. 2 = 2 is true. 2 = 3 is false. 1 + 1 = 2 is undecided, rewriting it as 2 = 2 or 1 + 1 = 1 + 1 makes it true.
Subtraction is an operation where the first number is added to the negation of the second number. Subtraction is written with a  between the numbers. The  symbol is called minus. 3 minus +2 is written 3  +2 and means 3 + (+2).
Multiplication is an operation where each of the things in the first number is replaced by the second number. Multiplying +3 by 2 replaces each of the 3 things with a 2, giving you +(2) + +(2) + +(2) = 6. Multiplying +15 by +3 requires splitting open the tenthings to replace each of the onethings inside them with a +3, giving you +15 +3things, which amount to +45 when added together. Multiplying 2 by 5 replaces the two things with 5things, giving you (5) + (5), which simplifies to +5 + +5 and amounts to +10. Multiplying +0 by 2 gives you zero 2things, which amounts to zero. Multiplying 2 by +0 replaces the two things with +0, giving you (+0) + (+0) = 0 + 0 = 0.
Fractional things are parts of onethings, much like how onethings are parts of tenthings. Fractional things are defined by how many parts are needed to make up a onething. When you need two parts for onething, the fractional thing is called a half. When you need three parts, the fractional thing is onethird. With five parts needed, it's a onefifth. Five onefifths added together amount to one. Similarly three onethirds amount to one. Fractional numbers are written as 1/parts. Onefifth is written +1/5. Onethird is written as +1/3. Negative onesixth is written 1/6.
Adding 1/6 to 1/6 results in 2/6. Adding 1/6 to 5/6 results in 6/6, which equals 1. Adding 1/6 to 1 is 1/6 + 6/6 = 7/6. Adding 1/5 to 4/5 results in 5/5, which also equals 1. Adding 1/5 to 1 is 1/5 + 5/5 = 6/5.
Multiplication is written with an x between the numbers. The x is called times. 3 times 5 is written 3 x 5 and means 3 multiplied by 5, amounting to 15.
Multiplying a number by one results in the number itself. 5 x 1 is 5. 1 x 5 is 5.
The inverse of a number multiplied with the number itself results in one. The inverse of a number is the fraction with number parts. Zero does not have an inverse. The inverse of +5 is +1/5. The inverse of 3 is 1/3.
Three times the inverse of three is 3 x 1/3 = 1/3 + 1/3 + 1/3 = 3/3 = 1.
Negative two times the inverse of negative two is 2 x 1/2 = (1/2) + (1/2) = 1/2 + 1/2 = 2/2 = 1.
Division is an operation where you multiply the first number with the inverse of the second number. 3 divided by 5 is 3 x 1/5 and amounts to 3/5. 8 divided by 2 is 8 x 1/2 and amounts to 8/2, which adds up to 4.
Division is written with a / between the numbers. 5 divided by 3 is written 5 / 3 and means 5 x 1/3, and amounts to 5/3, or 1 + 2/3.
The allowed moves when manipulating an equation with additions are:
 Apply associativity rule to two additions: a + (b + c) = (a + b) + c
 Apply commutativity rule to an addition: a + b = b + a
 Add the same value to both sides of the equality: a = a <=> a + b = a + b
For an equation with multiplications, the above moves apply as well, but there is an additional move when dealing with a mix of additions and multiplications:
 Apply distributivity rule: a x (b + c) = a x b + a x c
These are the only allowed structural moves for manipulating equations of additions and multiplications.
Additions can use the following equalities for simplifying expressions: a + 0 = a and a + a = 0. Multiplications can use a x 1 = a and a x 1/a = 1.
We can build more moves from the above, for example:
I'm playing fast and loose with the associativity in these,
a + b + c = (a + b) + c = a + (b + c),
I'm using whichever is the easiest at any given moment.
Also, multiplication binds tighter than addition:
a x b + c x d = (a x b) + (c x d)
x + b = c  a = a <=> a + b = a + b
x + b + b = c + b  a + a = 0
x + 0 = c + b  a + 0 = a
x = c + b  a  b = a + b
x = c  b
x + b = c <=> x = c  b
a x b = c  a = a <=> a x b = a x b
a x b x 1/b = c x 1/b  a x 1/a = 1 when a != 0
a x 1 = c x 1/b  a x 1 = a
a = c x 1/b  a / b = a x 1/b
a = c / b
a x b = c <=> a = c / b when a != 0
(a + b) x c = (a + b) x c  a x b = b x a
(a + b) x c = c x (a + b)  a x (b + c) = a x b + a x c
(a + b) x c = c x a + c x b  a x b = b x a
(a + b) x c = a x c + b x c
a x 1 = a x 1  a x b = b x a
a x 1 = 1 x a
a x b = a x b  (a + b) x c = c x a + c x b
a x b = (a + 1) x b + (1 x b)  1 x a = a
a x b = (a + 1) x b + b  a  b = a + b
a x b = (a  1) x b + b
(a) = (a)  a = b <=> a + b = a + b
a + (a) = a + (a)  a + (a) = 0
a + (a) = 0  a = b <=> a + b = a + b
a + a + (a) = a + 0  a + a = 0
0 + (a) = a + 0  a + b = b + a and a + 0 = a
(a) = a
1 x a = 1 x a  0 + a = a
1 x a = 0 + 1 x a  a  b = a + b
1 x a = 0  1 x a  1 x a = a
1 x a = 0  a  a  b = a + b
1 x a = 0 + a  0 + a = a
1 x a = a
1 x a = a  a = (a)
1 x a = (a)
a x b = a x b  1 x a = a
a x b = 1 x a x b  a x b = b x a
a x b = a x 1 x b  1 x a = a
a x b = a x b
a x b = 1 x a x b  1 x a = (a)
a x b = (a x b)
a x (b  c) = a x (b  c)  a  b = a + b
a x (b  c) = a x (b + c)  a x (b + c) = a x b + a x c
a x (b  c) = (a x b) + (a x c)  a x b = b x a
a x (b  c) = (a x b) + (c x a)  a x b = (a x b)
a x (b  c) = (a x b) + (c x a)  a x b = b x a
a x (b  c) = (a x b) + (a x c)  a  b = a + b
a x (b  c) = (a x b)  (a x c)
a x 0 = a x 0  0 = 1 + 1
a x 0 = a x (1 + 1)  a x (b + c) = a x b + a x c
a x 0 = a x 1 + a x 1  a x 1 = a
a x 0 = a x 1 + a  1 x a = a and a x b = b x a
a x 0 = a + a  a + a = 0
a x 0 = 0
Now go forth and solve the problems of algebra!
Links for Tuesday
A BlenderWebGL exporter got forked.
Rrrola's FX. Mandelboxes create a lot of cool geometry, riding that line between geometric and organic. I wonder if you could build game levels quickly by CSG cutting mandelboxes.
Some motion graphics videos on YouTube.
Rrrola's FX. Mandelboxes create a lot of cool geometry, riding that line between geometric and organic. I wonder if you could build game levels quickly by CSG cutting mandelboxes.
Some motion graphics videos on YouTube.
20100907
WebGL color spaces
There's an interesting discussion happening on the publicwebgl mailing list about the color spaces used by the textures and the canvas.
As far as I can tell, the problem is:
The proposed solutions are:
Well, actually, space has no color. It's transparent. It just looks black because there's very little light coming through it at most points.
Color spaces, on the other hand, are definitions of what the numerical pixel values in an image mean. A linear RGB color space means something like "given an output device with three component emitters at wavelengths R=580 nm, G=490 nm and B=440 nm, the number of photons emitted by an emitter is Max_Output * V, where V = component_value / (2^component_bits1)." For sRGB, the component values are transformed by the sRGB curve (roughly pow(V, 2.2)) to have more control over the important parts of the dynamic range. In particular, sRGB sacrifices precision in the light end to have more control in the darks. Converting linear>sRGB loses detail in the light end, whereas sRGB>linear loses detail in the dark end.
To project an image onto your eyeball, the image data goes through the following wringer: Image data in image color space > Viewer app color space > Window system color space > Display device color space > Hardware blinkenlichts > Eyeball.
Now, most software goes "huh?" at the mention of that and acts as if the image data is in the window system color space and uses the data directly. Further, the assumption when doing image editing operations is that the aforementioned color space is linear. (Why that might cause problems is that while 0.25 + 0.25 has the same luminance as 0.5 in a linear color space, to add the luminances in gamma 2.2 you have to do (0.25^2.2 + 0.25^2.2)^(1/2.2) = 0.34.)
Software that requires accurate color handling usually converts the 8bit values to 16 or 32bit linear color space to get around the conversion errors and to get easy correct math internally. The resulting highdepth image is then converted down to window system color space for viewing.
The real solution is to use 64 bits per channel and specify the channels to mean photon count / second / mm^2. With that, you can crank from 1 photon / second all the way up to 6500x noon sunlight. And linear math. Then use that as the interchange format between everything. Heyo, massive framebuffers~
In the meantime, maybe the browser could have an iconvlike system for converting images from one color space to another. WebGL 1.0 doesn't have float textures, so upconverting to linear float textures has to wait for the float texture extension.
With linear float textures, you could upconvert all images to float textures, then use a linear float FBO for rendering, and finally convert the FBO to sRGB to display on the canvas element.
As far as I can tell, the problem is:
 Simple math works correctly only in a linear color space.
 Images and the rest of web content is 8 bits per channel sRGB.
 Converting between two 8bit color spaces is lossy.
 You can't tell what an image's color space is from JavaScript.
 You can't easily convert between color spaces in JavaScript.
The proposed solutions are:
 Pass image data asis to WebGL, treat WebGL canvas values as sRGB. (I.e. the donothing solution. If application writers want correct math, they should use linear textures and do a final lineartosRGB pass on the rendered image.)
 Optionally convert textures to linear color space, treat WebGL canvas as linear. (Problematic as there is severe information loss at the conversions. Math works correctly though.)
Color spaces? But I thought space was black?
Well, actually, space has no color. It's transparent. It just looks black because there's very little light coming through it at most points.
Color spaces, on the other hand, are definitions of what the numerical pixel values in an image mean. A linear RGB color space means something like "given an output device with three component emitters at wavelengths R=580 nm, G=490 nm and B=440 nm, the number of photons emitted by an emitter is Max_Output * V, where V = component_value / (2^component_bits1)." For sRGB, the component values are transformed by the sRGB curve (roughly pow(V, 2.2)) to have more control over the important parts of the dynamic range. In particular, sRGB sacrifices precision in the light end to have more control in the darks. Converting linear>sRGB loses detail in the light end, whereas sRGB>linear loses detail in the dark end.
To project an image onto your eyeball, the image data goes through the following wringer: Image data in image color space > Viewer app color space > Window system color space > Display device color space > Hardware blinkenlichts > Eyeball.
Now, most software goes "huh?" at the mention of that and acts as if the image data is in the window system color space and uses the data directly. Further, the assumption when doing image editing operations is that the aforementioned color space is linear. (Why that might cause problems is that while 0.25 + 0.25 has the same luminance as 0.5 in a linear color space, to add the luminances in gamma 2.2 you have to do (0.25^2.2 + 0.25^2.2)^(1/2.2) = 0.34.)
Software that requires accurate color handling usually converts the 8bit values to 16 or 32bit linear color space to get around the conversion errors and to get easy correct math internally. The resulting highdepth image is then converted down to window system color space for viewing.
The real solution is to use 64 bits per channel and specify the channels to mean photon count / second / mm^2. With that, you can crank from 1 photon / second all the way up to 6500x noon sunlight. And linear math. Then use that as the interchange format between everything. Heyo, massive framebuffers~
In the meantime, maybe the browser could have an iconvlike system for converting images from one color space to another. WebGL 1.0 doesn't have float textures, so upconverting to linear float textures has to wait for the float texture extension.
With linear float textures, you could upconvert all images to float textures, then use a linear float FBO for rendering, and finally convert the FBO to sRGB to display on the canvas element.
20100906
Ramble on sorting
Merge sort merges already sorted regions together. It also has an operation to turn a small region into a sorted region. The merge sort splits its input into small regions, turns each of those into a sorted region, and finally merges them together.
There are some sorted regions that have a trivial merge operation: if the largest value in region X is smaller or equal to the smallest value in region Y, the merge operation is the concatenation X ++ Y. We can preprocess the input of the merge sort so that the two merged regions have this property. The preprocessing pass uses a pivot value and moves all values smaller than the pivot to the start of the region, leaving the end of the region for values larger than the pivot. Now you don't have to do anything to merge the regions together once they're sorted. This is the idea behind the quicksort algorithm.
Because pivoting can cause the regions to have different sizes, there is a worstcase scenario where the pivot is the only element in the other region. Because of this scenario, quicksort has O(n^2) worstcase time complexity. If you pick the pivot to be the median of the region with an O(n) median finding algorithm and further split the equaltopivot values equally between the subregions, you can get an O(n lg n) quicksort (but the O(n) median finding algorithm has a high overhead, so they say the resulting algorithm is slower than quicksort in the average case).
You can do a O(n) preprocessing pass over the input data to find all already sorted regions in the input data. If the number of found regions is small, you're dealing with nearly sorted data. You can directly merge all sorted regions that are larger than the region size for the region sorting operation. The merging should be done so that the smallest regions are merged together first. Merging regions of unequal size causes the worsttime complexity to increase (merging a 2region and a Xregion takes X+2 time, consider N/4 2regions merged into a N/2region: it takes N/4 * (N/2 + (2+N) / 2) time, which is in O(N^2).)
The preprocessing pass can also reverse regions that are sorted in the opposite direction. And it is probably possible to topologically sort the sorted region descriptors by their minmaxvalues to find regions that can be merged by concatenation. Another merge trick would be to use binary search on one region to find the span that the other region overlaps, then you could get away with
If you keep track of how equally the quicksort pivot pass splits a region, you can avoid the quicksort worstcase by handing pathological splits over to merge sort. That is a bit nasty though, as the resulting regions no longer have the concatenative property and must be merged the slow way. Another alternative is to do something like introsort and switch over to heapsort once recursion depth exceeds lg n.
Once merge sort or quicksort reaches the maximum unsorted region size (say, 8 elements), implementations usually switch over to a lowoverhead O(n^2) sorting algorithm like insertion sort.
The partitioning operation of merge sort is a noop, but the partitioning operation in quicksort takes O(n) time. In the same fashion, the merge operation of quicksort is a noop, but the merge operation of merge sort takes O(n) time.
You can do the merge sort merge in parallel by picking a pivot and splitting the sorted regions into two regions: one with values smaller or equal to the pivot and the other with values larger or equal to the pivot. The splitting operation consists of finding the index of the pivot in each region using a binary search. You then merge the small values with the small values and the large values with the large values. The merge between the merged smalls and larges is concatenation, so you can just allocate an array with the beginning reserved for the small values and the end reserved for the large values, and run the small and large merges in parallel. Recurse to increase parallelism. The nice thing about this approach is that you get segregated inplace parallelism, each of the subpartitions can write to its own part of the root target array. The problem it suffers from is the quicksort partitioning problem: the pivot should be the median of the data set. Because the sorted regions are sorted, we can get the medians of each region easily enough, the worst case with one of those should be a 25:75split when the regions are of equal size.
If your input data is 256 elements long and you split it into 8element insertionsorted regions, executing each in parallel (on a 32core), what's the merge time? The first merge level you want to split into 2way parallel merges (16 merges > 32 merges). The second level is split 4way (8 merges > 32 merges), then 8way and 16way? The split bsearch overhead is 2 log_{2} mergeSize per level (subsplits execute in parallel).
To do the 16way split (assuming we get a perfectly balanced split), we first do a 2 log_{2} 128 = 14 op split, then 2 log_{2} 64 = 12 op split, then 10 op and 8 op, for total split cost of 44 ops. The merge then takes 16 ops to run (8+8merge), for total merge cost of 60 ops. If we stop the split one level short, it'd take 36 ops to split and 32 ops to merge, 68 ops in total. So I guess you do want to split it all the way down if you have negligible overhead.
The total runtime for the zerooverhead parallel merge tree would be 44 + 30 + 18 + 8 + 4*16 = 164 ops. Total sort time with 64op 8elem sorts would be 228 ops. Hey, subO(n). Quadruple the input size and the core count and it's 8+18+30+44+60+78+6*16 + 64 = 398 ops. Hey, sublinear parallel scaling.
There is an another way to do parallel merging as well (cribbed from some lecture slides): for regions A and B, write A[i] to i + B.indexOfLarger(A[i]) and write B[i] to i + A.indexOfLarger(B[i]). Nparallelizable, serial O(N log N) time, parallel O(log N) time. However, it relies on scattered writes, so it doesn't seem very memoryaccess friendly (ideally each CPU core should be working on its own cache line, otherwise you get cache contention and traffic between the CPUs).
Fun reading: Heapsort, Quicksort, and Entropy.
There are some sorted regions that have a trivial merge operation: if the largest value in region X is smaller or equal to the smallest value in region Y, the merge operation is the concatenation X ++ Y. We can preprocess the input of the merge sort so that the two merged regions have this property. The preprocessing pass uses a pivot value and moves all values smaller than the pivot to the start of the region, leaving the end of the region for values larger than the pivot. Now you don't have to do anything to merge the regions together once they're sorted. This is the idea behind the quicksort algorithm.
Because pivoting can cause the regions to have different sizes, there is a worstcase scenario where the pivot is the only element in the other region. Because of this scenario, quicksort has O(n^2) worstcase time complexity. If you pick the pivot to be the median of the region with an O(n) median finding algorithm and further split the equaltopivot values equally between the subregions, you can get an O(n lg n) quicksort (but the O(n) median finding algorithm has a high overhead, so they say the resulting algorithm is slower than quicksort in the average case).
You can do a O(n) preprocessing pass over the input data to find all already sorted regions in the input data. If the number of found regions is small, you're dealing with nearly sorted data. You can directly merge all sorted regions that are larger than the region size for the region sorting operation. The merging should be done so that the smallest regions are merged together first. Merging regions of unequal size causes the worsttime complexity to increase (merging a 2region and a Xregion takes X+2 time, consider N/4 2regions merged into a N/2region: it takes N/4 * (N/2 + (2+N) / 2) time, which is in O(N^2).)
The preprocessing pass can also reverse regions that are sorted in the opposite direction. And it is probably possible to topologically sort the sorted region descriptors by their minmaxvalues to find regions that can be merged by concatenation. Another merge trick would be to use binary search on one region to find the span that the other region overlaps, then you could get away with
memcpy(dst,smaller,smaller_sz); merge(dst+smaller_sz,overlap,region2,overlap_sz,region2_sz); memcpy(dst+smaller_sz+overlap_sz+region2_sz, larger, larger_sz)
. If you keep track of how equally the quicksort pivot pass splits a region, you can avoid the quicksort worstcase by handing pathological splits over to merge sort. That is a bit nasty though, as the resulting regions no longer have the concatenative property and must be merged the slow way. Another alternative is to do something like introsort and switch over to heapsort once recursion depth exceeds lg n.
Once merge sort or quicksort reaches the maximum unsorted region size (say, 8 elements), implementations usually switch over to a lowoverhead O(n^2) sorting algorithm like insertion sort.
The partitioning operation of merge sort is a noop, but the partitioning operation in quicksort takes O(n) time. In the same fashion, the merge operation of quicksort is a noop, but the merge operation of merge sort takes O(n) time.
You can do the merge sort merge in parallel by picking a pivot and splitting the sorted regions into two regions: one with values smaller or equal to the pivot and the other with values larger or equal to the pivot. The splitting operation consists of finding the index of the pivot in each region using a binary search. You then merge the small values with the small values and the large values with the large values. The merge between the merged smalls and larges is concatenation, so you can just allocate an array with the beginning reserved for the small values and the end reserved for the large values, and run the small and large merges in parallel. Recurse to increase parallelism. The nice thing about this approach is that you get segregated inplace parallelism, each of the subpartitions can write to its own part of the root target array. The problem it suffers from is the quicksort partitioning problem: the pivot should be the median of the data set. Because the sorted regions are sorted, we can get the medians of each region easily enough, the worst case with one of those should be a 25:75split when the regions are of equal size.
If your input data is 256 elements long and you split it into 8element insertionsorted regions, executing each in parallel (on a 32core), what's the merge time? The first merge level you want to split into 2way parallel merges (16 merges > 32 merges). The second level is split 4way (8 merges > 32 merges), then 8way and 16way? The split bsearch overhead is 2 log_{2} mergeSize per level (subsplits execute in parallel).
To do the 16way split (assuming we get a perfectly balanced split), we first do a 2 log_{2} 128 = 14 op split, then 2 log_{2} 64 = 12 op split, then 10 op and 8 op, for total split cost of 44 ops. The merge then takes 16 ops to run (8+8merge), for total merge cost of 60 ops. If we stop the split one level short, it'd take 36 ops to split and 32 ops to merge, 68 ops in total. So I guess you do want to split it all the way down if you have negligible overhead.
The total runtime for the zerooverhead parallel merge tree would be 44 + 30 + 18 + 8 + 4*16 = 164 ops. Total sort time with 64op 8elem sorts would be 228 ops. Hey, subO(n). Quadruple the input size and the core count and it's 8+18+30+44+60+78+6*16 + 64 = 398 ops. Hey, sublinear parallel scaling.
There is an another way to do parallel merging as well (cribbed from some lecture slides): for regions A and B, write A[i] to i + B.indexOfLarger(A[i]) and write B[i] to i + A.indexOfLarger(B[i]). Nparallelizable, serial O(N log N) time, parallel O(log N) time. However, it relies on scattered writes, so it doesn't seem very memoryaccess friendly (ideally each CPU core should be working on its own cache line, otherwise you get cache contention and traffic between the CPUs).
Fun reading: Heapsort, Quicksort, and Entropy.
20100905
20100904
Two's complement integers
Oh hey, I finally get how two's complement works after reading the first couple paragraphs of Why computers represent signed integers using two’s complement.
Simply put, adding a negative number to a positive number overflows if the result is positive, and doesn't overflow if the result is negative.
With 32bit integers
This is kinda interesting, we can put the separator between positive and negative numbers where we like:
Or maybe you need extra positive range and only 256 negative values:
And yeah, division doesn't work out of the box.
Simply put, adding a negative number to a positive number overflows if the result is positive, and doesn't overflow if the result is negative.
With 32bit integers
5  1 = 5 + (1) = 5 + (2^{32}1) = 2^{32}+4 = 4
5  3 = 5 + (2^{32}3) = 2^{32}+2 = 2
5  7 = 5 + (2^{32}7) = 2^{32}2 = 2
This is kinda interesting, we can put the separator between positive and negative numbers where we like:
void print_custom_int (unsigned int n, unsigned int separator)
{
if (n < separator) {
printf("%u\n", n);
} else {
printf("%u\n", ~n+1);
}
}
print_custom_int(0, 200);
// 0
print_custom_int(1, 200);
// 1
print_custom_int(1, 200);
// 1
print_custom_int(200, 200);
// 4294967096
print_custom_int(4294967097, 200);
// 199
Or maybe you need extra positive range and only 256 negative values:
print_custom_int(4000000000, 0xFFFFFF00);
// 4000000000
print_custom_int(2*2004001006, 0xFFFFFF00);
// 4008002012
print_custom_int(3280+15, 0xFFFFFF00);
// 97
// Watch out for overflows!
print_custom_int(16*17, 0xFFFFFF00);
// 4294967024
And yeah, division doesn't work out of the box.
unsigned int div_custom_int (unsigned int a, unsigned int b, unsigned int separator)
{
if (a < separator && b < separator)
return a / b;
else if (a < separator)
return ~(a / (~b+1))+1;
else if (b < separator)
return ~((~a+1) / b)+1;
else
return (~a+1) / (~b+1);
}
unsigned int n20 = 20, n3 = 3, p4 = 4, p2 = 2;
print_custom_int(n20 / p4, 0xFFFFFF00);
// 1073741819
print_custom_int(p4 / n3, 0xFFFFFF00);
// 0
print_custom_int(p4 / p2, 0xFFFFFF00);
// 2
print_custom_int(n20 / n3, 0xFFFFFF00);
// 0
// vs.
print_custom_int(div_custom_int(n20, p4, 0xFFFFFF00), 0xFFFFFF00);
// 5
print_custom_int(div_custom_int(p4, n3, 0xFFFFFF00), 0xFFFFFF00);
// 1
print_custom_int(div_custom_int(p4, p2, 0xFFFFFF00), 0xFFFFFF00);
// 2
print_custom_int(div_custom_int(n20, n3, 0xFFFFFF00), 0xFFFFFF00);
// 6
20100902
20100831
Branching factor generator
A function to generate a stream of tree branching factors to approximate a noninteger branching factor.
You could also write the average function as
even though that is less accurate. You can use a similar trick for the branches' function to use a running average instead of a running sum.
To generate a tree of wanted size out of the branching factor list, you'd have a tree generator that eats values off the list and generates tree nodes breadthfirst. It is kind of a pain to write.
branches' x i sum =
let v = (if sum/i > x then floor x else ceiling x) :: Int in
v : branches' x (i+1) (sum+fromIntegral v)
branches n = branches' n 0 0
e_branches = branches (exp 1)
e_100000 = take 100000 $ e_branches
average l = sum l / fromIntegral (length l)
average $ map fromIntegral e_100000 == 2.71828
You could also write the average function as
average2 l = snd $ foldl (\(i, avg) e > (i+1, (avg*(i1) + e)/i)) (1, 0) l
average2 $ map fromIntegral e_100000
even though that is less accurate. You can use a similar trick for the branches' function to use a running average instead of a running sum.
branches2' x i avg =
let v = (if avg > x then floor x else ceiling x) :: Int in
v : branches2' x (i+1) ((avg*(i1)+fromIntegral v)/i)
branches2 n = branches2' n 1 0
average $ map fromIntegral $ take 100000 $ branches2 (exp 1)
To generate a tree of wanted size out of the branching factor list, you'd have a tree generator that eats values off the list and generates tree nodes breadthfirst. It is kind of a pain to write.
And some Tuesday links
ManComputer Symbiosis by J.C.R. Licklider. Continuing on the HCI vision golden oldies hitlist.
Twin hurricanes.
A Dust Mite.
Twin hurricanes.
A Dust Mite.
20100830
Sunday links
Even though it's technically Monday already.
Yellowstone: what lies beneath via Eruptions.
As We May Think by Vannevar Bush.
De architectura by Vitruvius.
The Periplus of the Erythraean Sea.
"My work in the houses of the Gods is abundant, and now I have begun an undertaking: Send much gold!"
Video of asteroid discovery, best viewed in highdef.
Hunza debris flow video.
Yellowstone: what lies beneath via Eruptions.
As We May Think by Vannevar Bush.
De architectura by Vitruvius.
The Periplus of the Erythraean Sea.
"My work in the houses of the Gods is abundant, and now I have begun an undertaking: Send much gold!"
Video of asteroid discovery, best viewed in highdef.
Hunza debris flow video.
20100828
Folding, part 3: parallel folds
So, you can do a parallel fold if your folding function is associative.
But for the general case, we need something stronger. Roll in the reduceFold! It splits the foldable list into sublists, folds each sublist, and reduces the subresults into a single value.
Let's write some parallel list operations in terms of reduceFold.
You might've noticed that both parFold and reduceFold are creating a parallel evaluation spark for each element in the lists. This is not a very good idea as the sparking overhead is often going to dominate the execution time and kill performance.
So we need some way to control the reduce tree branching factor and clump up the leaflevel folds into longer operations. I don't know if I can manage to come up with an automagic or even semiautomagic way to do this. I would like to use yesterday's theoretical math as guidance for the tree shape optimizer. Maybe we'll find out in the next installment.
 Here i should be the neutral element of f, that is f i x = x.
 E.g. 0 for + and 1 for *
parFold f i [] = i
parFold f i [x] = f i x
parFold f i xs =
l `par` r `pseq` f l r
where l = parFold f i left
r = parFold f i right
left, right = splitAt (length xs `div` 2) xs
 See how associative functions work right with it
parFold (+) 0 [1..10] == foldl (+) 0 [1..10]
parFold (*) 1 [1..10] == foldl (*) 1 [1..10]
 But others have problems
parFold () 0 [1..10] /= foldl () 0 [1..10]
parFold (/) 1 [1..10] /= foldl (/) 0 [1..10]
 As  and / are defined as A + B and A * 1/B,
 we can work around these instances
parFold (+) 0 (map negate [1..10]) == foldl () 0 [1..10]
parFold (*) 1 (map recip [1..10]) == foldl (/) 1 [1..10]
But for the general case, we need something stronger. Roll in the reduceFold! It splits the foldable list into sublists, folds each sublist, and reduces the subresults into a single value.
reduceFold reduce f init [] = init
reduceFold reduce f init [x] = f init x
reduceFold reduce f init xs =
l `par` r `pseq` reduce l r
where l = reduceFold reduce f init left
r = reduceFold reduce f init right
(left, right) = splitAt (length xs `div` 2) xs
 Associative operations work as usual
reduceFold (+) (+) 0 [1..10] == foldl (+) 0 [1..10]
reduceFold (*) (*) 1 [1..10] == foldl (*) 1 [1..10]
 And we can now combine the subresults in a different fashion
reduceFold (+) () 0 [1..10] == foldl () 0 [1..10]
reduceFold (*) (/) 1 [1..10] == foldl (/) 1 [1..10]
Let's write some parallel list operations in terms of reduceFold.
parMap f lst = reduceFold (++) (\l i > l ++ [f i]) [] lst
parFilter f lst = reduceFold (++) (\l i > if f i then l++[i] else l) [] lst
parConcat lsts = reduceFold (++) (++) [] lsts
parConcatMap f lst = reduceFold (++) (\l i > l++(f i)) [] lst
optLeft (Just x) _ = Just x
optLeft Nothing (Just x) = Just x
optLeft Nothing Nothing = Nothing
optFilter f i = if f i then Just i else Nothing
parFind f lst = reduceFold optLeft (\l i >
optLeft l (optFilter f i)) Nothing lst
optRight x y = optLeft y x
parFindLast f lst = reduceFold optRight (\l i >
optRight l (optFilter f i)) Nothing lst
 Yeah, they pretty much work
divisibleBy n x = x `mod` n == 0
parMap (+ 10) [1..10] == [11..20]
parFilter (divisibleBy 3) [1..15] == [3,6,9,12,15]
parConcat [[1], [2,3,4], [5,6,7], [8,9], [10..20]] == [1..20]
parConcatMap (\x > [x,x,x]) [1..3] == [1,1,1,2,2,2,3,3,3]
parFind (divisibleBy 127) [90,93..900] == Just 381
parFind (> 10) [1..10] == Nothing
parFindLast (divisibleBy 127) [90,93..900] == Just 762
You might've noticed that both parFold and reduceFold are creating a parallel evaluation spark for each element in the lists. This is not a very good idea as the sparking overhead is often going to dominate the execution time and kill performance.
So we need some way to control the reduce tree branching factor and clump up the leaflevel folds into longer operations. I don't know if I can manage to come up with an automagic or even semiautomagic way to do this. I would like to use yesterday's theoretical math as guidance for the tree shape optimizer. Maybe we'll find out in the next installment.
A small hillclimb optimizer
hillClimb f current delta error 0 = current
hillClimb f current delta error steps  delta < error = current
hillClimb f current delta error steps =
if curVal > prevVal && curVal > nextVal
then hillClimb f current (delta / 2) error (steps1)
else if prevVal > nextVal
then hillClimb f (currentdelta) delta error (steps1)
else hillClimb f (current+delta) delta error (steps1)
where curVal = f current
prevVal = f (currentdelta)
nextVal = f (current+delta)
May have bugs in it, only tested on (\x > (x**2)).
20100827
Folding, part 2: parallel reduction math
Parallel folding is a bit of an oxymoron. The fold operation sequentially builds up an accumulator value from the elements of a data structure. So if it's inherently sequential, how can it be parallelized?
If the reduce function given to fold is associative, the fold can be parallelized. If it's not associative, the fold is sequential.
Associativity means that you can turn
Let's do the math to figure out the theoretical runtime for a reduction network. One parallel reduction step takes
Putting it all together, we get something like
From the above we can come up with a couple of generalizations:
If computeTime is O(1) with regard to branchingFactor and you have infinite bandwidth or zero result size (both mean that transferTime is zero), you want a branching factor of
If you have infinitely fast compute nodes (computeTime = 0), you want a branching factor of
Note that it's highly likely that
For O(1) computeTime, finite computational power and nonzero transfer time, the optimal branching factor is somewhere between
The above approximations can be used for branching factor dependent computation times too. You do that by replacing the computation time with perreducestep overhead and combining computeTime in transferTime. The new ratio is
For an example, consider an email tournament that aims to pick the best email out of ten thousand. You receive some fixed amount of emails and your task is to pick the best one, then forward it to your assigned receiver (who does the same, etc., and the last reducer sends the mail to the award ceremony). Let's estimate that the transfer time is 5 seconds, the computing time is a minute per email, and the fixed overhead is 8 hours. This gives us an overhead:processingratio of 443. If each receiver processes 10 emails per session, running the tournament would take 33 hours. But if each receiver does 118 emails a go, the tournament would be done in 19.5 hours. And if they do 200 emails, it'd take ... 20.2 hours. (Yeah, 118 is the best branching factor here.)
Suppose you do the same tournament but with people glued to their mail apps. Now the overhead might be just one minute. This time, doing the tournament with 118 mails per session would take a snappy 4.1 hours. But as the role of overhead has fallen, we should reduce the branching factor as well. With 4 mails per session, the tournament would take only 35 minutes. (With 2 mails per session, 42 minutes. 4 is optimal.)
If the reduce function given to fold is associative, the fold can be parallelized. If it's not associative, the fold is sequential.
Associativity means that you can turn
(a + b) + c
into a + (b + c)
, or, in function call terms, (f (f a b) c) == (f a (f b c))
. When you have a long call chain like a + b + c + d + e + f + g + h
, it means that you can split it into (a + b) + (c + d) + (e + f) + (g + h)
, evaluate the parenthesized bits in parallel, and then reduce the results to the final value. The reduction can also be parallelized, as it is the fold (a_b + c_d) + (e_f + g_h)
. This sort of parallel tree reduction is what mapreduce is all about btw: compute a bunch of values in parallel and reduce them to the result value.Where it all goes by the wayside
Let's do the math to figure out the theoretical runtime for a reduction network. One parallel reduction step takes
computeTime
time. Moving the result to the next reduce node takes totalResultSize / bandwidth
time. The variable totalResultSize
is defined as resultSize * branchingFactor
, as the branching factor is the amount of nodes sending their reduce results to the reduce node. We can also split it into a more useful form branchingFactor * (resultSize / bandwidth)
and write transferTime = resultSize / bandwidth
Finally, the amount of parallel reduction steps required to reduce a tree of size N
is log_branchingFactor N
(and 1 < branchingFactor <= N
).Putting it all together, we get something like
treeReduceTime n branchingFactor computeTime transferTime =
stepCount * stepTime
where stepTime = computeTime + (branchingFactor * transferTime)
stepCount = if branchingFactor == 1.0
then n
else log n / log branchingFactor
From the above we can come up with a couple of generalizations:
If computeTime is O(1) with regard to branchingFactor and you have infinite bandwidth or zero result size (both mean that transferTime is zero), you want a branching factor of
N
. That's easy enough to see:
With infinite bandwidth, the treeReduceTime is
(computeTime + (branchingFactor * 0)) + stepCount =
computeTime * stepCount
To minimize the above, we need to minimize stepCount.
The smallest stepCount we can have is 1.
The stepCount is 1 when branchingFactor is N.
If you have infinitely fast compute nodes (computeTime = 0), you want a branching factor of
e
. That's a bit tougher to prove[1] (or maybe I'm just lacking some essential bit of knowledge that would make the proof easier.)Note that it's highly likely that
computeTime
depends on branchingFactor
. If the reduce algorithm is O(n), computeTime = branchingFactor * elementComputeTime
, and in that case the optimum branching factor is e
(as elementComputeTime and transferTime have the same factor, they can be rolled together and the infinitely fast compute nodes proof applies). For O(n^2) algorithms, the optimal branching factor seems to be sqrt(e)
.For O(1) computeTime, finite computational power and nonzero transfer time, the optimal branching factor is somewhere between
e
and N
. I used a simple hillclimb optimizer to find optimal branching factors for different computation/transfer ratios, and they do seem to be independent of N (well, apart from the maximum branching factor). For 1:1 compute time : transfer time, the branching factor is around 3.6. For 1:10 (low bandwidth), the branching factor is a bit over 2.8. For 10:1 (slow compute), the branching factor is about 8.6. Here's some approximate ratios for integer branching factors: 3 > 1:3.5, 4 > 1.5:1, 5 > 3:1, 6 > 5:1, and for 1000:1 the factor is about 226.The above approximations can be used for branching factor dependent computation times too. You do that by replacing the computation time with perreducestep overhead and combining computeTime in transferTime. The new ratio is
overhead : (computeTime+transferTime)
. For intuition on this: If the overhead dominates, you want more branching as that will decrease the contribution of the overhead. If you have very little overhead, you want a branching factor close to e
.For an example, consider an email tournament that aims to pick the best email out of ten thousand. You receive some fixed amount of emails and your task is to pick the best one, then forward it to your assigned receiver (who does the same, etc., and the last reducer sends the mail to the award ceremony). Let's estimate that the transfer time is 5 seconds, the computing time is a minute per email, and the fixed overhead is 8 hours. This gives us an overhead:processingratio of 443. If each receiver processes 10 emails per session, running the tournament would take 33 hours. But if each receiver does 118 emails a go, the tournament would be done in 19.5 hours. And if they do 200 emails, it'd take ... 20.2 hours. (Yeah, 118 is the best branching factor here.)
Suppose you do the same tournament but with people glued to their mail apps. Now the overhead might be just one minute. This time, doing the tournament with 118 mails per session would take a snappy 4.1 hours. But as the role of overhead has fallen, we should reduce the branching factor as well. With 4 mails per session, the tournament would take only 35 minutes. (With 2 mails per session, 42 minutes. 4 is optimal.)
With infinitely fast compute nodes, the treeReduceTime is
(branchingFactor * transferTime) * stepCount
For a branching factor of 1, that equals
transferTime * n
For other branching factors, the equation is
b * t * log_b(n), where b = branchingFactor and t = transferTime
From which we see that t is a fixed factor,
so the minimum of the above equation is the minimum of the equation
b * log_b(n)
Which can be written as
log_b n = log n / log b, b > 1
b * log_b(n) =
b * (log n / log b) =
(b / log b) * log n
From which we see that log n is a fixed factor,
so the minimum of b * t * log_b(n) is the minimum of
b / log b
To find the minimum, find the derivative
D(b/log b) = (1*log b  b*1/b) / (log b)^2
= (log b  1) / (log b)^2
And its inflection point at b = e,
(log e  1) / (log e)^2 = (1  1) / 1^2 = 0 / 1 = 0,
we check the values of b / log b on both sides of the
inflection point to be larger than the value at the inflection point,
e / log e = e =~ 2.718
2 / log 2 =~ 2.885
3 / log 3 =~ 2.731,
thus confirming it as a local minimum.
Checking the discontinuity of b / log b at b = 1,
b / 0 = infinity,
and discovering it to be larger than e / log e,
we find the global minimum of b / log b to lie at b = e.
Plugging that back into the original equation:
minimum of branchingFactor * transferTime * stepCount is
transferTime * min(n, e*ln(n))
and because n >= e*ln(n), the minimum is
transferTime * e * ln(n).
20100826
Folding, part 1
The fold function is a data structure accumulator: it goes through every element in the data structure and builds up an accumulator value from them. An example for folding lists:
With a fold you can do maps and filters of different kinds. Here's a few examples:
You can also do find operations with fold, but they're not as efficient as you'd like, because the fold goes through all the elements in the list. To make find efficient, we need a fold with break.
And that's it for today. Tomorrow, parallel folding.
foldl f acc [] = acc
foldl f acc (x:xs) = foldl f (f acc x) xs
With a fold you can do maps and filters of different kinds. Here's a few examples:
 id x = x
reverseMap f lst = foldl (\l i > (f i):l) [] lst
reverse = reverseMap id
map f lst = reverse (reverseMap f lst)
filter f lst =
reverse (foldl (\l i >
if f i
then i:l
else l
) [] lst)
concat a b = foldl (\l i > i:l) b (reverse a)
concatMap f lst = foldl (\l i > concat l (f i)) [] lst
triplicate lst = concatMap (\i > [i,i,i]) lst
drop n lst =
reverse $ snd $ foldl (\(c,l) i >
if c >= n
then (c, i:l)
else (c+1, l)) (0,[]) lst
You can also do find operations with fold, but they're not as efficient as you'd like, because the fold goes through all the elements in the list. To make find efficient, we need a fold with break.
findInefficient f lst =
foldl (\e i >
case e of
Nothing > (if f i then Just i else e)
Just x > e) Nothing lst
data Break a = Break a  Continue a
foldlWithBreak f acc [] = acc
foldlWithBreak f acc (x:xs) =
case f acc x of
Break a > a
Continue a > foldlWithBreak f a xs
find f lst =
foldlWithBreak (\e i >
if f i
then Break (Just i)
else Continue Nothing) Nothing lst
take n lst =
reverse $ snd $ foldlWithBreak (\(c, l) i >
if c >= n
then Break (c,l)
else Continue (c+1, i:l)) (0, []) lst
any f lst =
 maybe False (const True) (find f lst)
case find f lst of
Just x > True
Nothing > False
all f lst =
 not $ any (not.f) lst
not (any (\x > not (f x)) lst)
And that's it for today. Tomorrow, parallel folding.
FormData file uploads
XMLHttpRequest Level 2 FormData is super. Loving it.
var fd = new FormData();
fd.append("key", "value");
var files = dropEvent.dataTransfer.files;
for (var i=0; i<files.length; i++)
fd.append("file", files[i]);
var xhr = new XMLHttpRequest();
xhr.open("POST", "/upload");
xhr.send(fd);
20100825
sse.h update, HTML5 fiddling, more links
I updated sse.h with license info (MIT) and added software implementations of recip() and rsqrt() to double2 and double4. This Vector Classes library might be more useful for Real Serious Use, though.
Currently fiddling with HTML5 media tags (cursed be audio and video codecs. If it were possible to implement media codecs in JS, that would be a solution.) Also doing draganddrop file uploads. The event API is ... interesting. To receive a drop event, you need to preventDefault other dragrelated events.
Failures are strengths.
Have some links to go:
Eruptions blog readership.
Activity at Sakurajima Volcano Intensifies.
Regionbased memory management, ML Kit, A Simplified Account of Region Inference, Combining Region Inference and Garbage Collection, A Safe Runtime System in Cyclone.
Currently fiddling with HTML5 media tags (cursed be audio and video codecs. If it were possible to implement media codecs in JS, that would be a solution.) Also doing draganddrop file uploads. The event API is ... interesting. To receive a drop event, you need to preventDefault other dragrelated events.
Failures are strengths.
Have some links to go:
Eruptions blog readership.
Activity at Sakurajima Volcano Intensifies.
Regionbased memory management, ML Kit, A Simplified Account of Region Inference, Combining Region Inference and Garbage Collection, A Safe Runtime System in Cyclone.
20100823
A new blog post!
No, wait, sorry, just more link copypasta. My bad.
Join, or Die. Benjamin Franklin, truly a master of Perl. Via Hark, a vagrant.
Ice Island Calves off Petermann Glacier, via Bad Astronomy.
Join, or Die. Benjamin Franklin, truly a master of Perl. Via Hark, a vagrant.
Ice Island Calves off Petermann Glacier, via Bad Astronomy.
20100822
Yet more random links
Electric blue planktom blooms off Ireland.
Employer Expectations, Peer Effects and Productivity: Evidence from a Series of Field Experiments [PDF]. A study using Amazon Mechanical Turk, one finding was that peers reward effort rather than output quality. Quote from the implications section: "For example, it may be difficult to get workers to substitute easy, correct procedures for difficult, inefficient procedures. Ironically, the difficulty itself might make an outdated procedure harder to replace, as workers who adopt the easier method might be perceived to be shirking. The finding that exposure to lowoutput work lowers output, combined with the finding that lowproductivity reduces willingness to punish, suggests the possibility of an organizational vicious cycle: after observing idiosyncratically bad work, workers may lower their own output and punish less in response, in turn reducing other workers’ incentives to be highly productive."
Forecasting bird migration for the Finnish Air Force. "Why? Bird is a threat."
Employer Expectations, Peer Effects and Productivity: Evidence from a Series of Field Experiments [PDF]. A study using Amazon Mechanical Turk, one finding was that peers reward effort rather than output quality. Quote from the implications section: "For example, it may be difficult to get workers to substitute easy, correct procedures for difficult, inefficient procedures. Ironically, the difficulty itself might make an outdated procedure harder to replace, as workers who adopt the easier method might be perceived to be shirking. The finding that exposure to lowoutput work lowers output, combined with the finding that lowproductivity reduces willingness to punish, suggests the possibility of an organizational vicious cycle: after observing idiosyncratically bad work, workers may lower their own output and punish less in response, in turn reducing other workers’ incentives to be highly productive."
Forecasting bird migration for the Finnish Air Force. "Why? Bird is a threat."
Random links for another day
PacketShader: a GPUAccelerated Software Router  eeh?
Enceladus plumes with anigifs. Cassini just keeps taking these awesome shots of the Saturn system.
Kirkenes harbour (and googling with the name of the ship got me this. Take that for good measure as well.)
Fantoft stave church, PreColombian Mexican architecture
... the deepsea food web is fueled by a rain of dead plants and animals from surface waters. Sediment for the sediment god, let the red clay sink back into the mantle.
Cryosphere Today.
That reminds me: We inhabit a few meter thick layer at the solidgas interface of a 6500 km radius sphere of rock and iron. Go up or down a few meters and you'll die. Go to the solidliquid or gasliquid interface and you'll die. Go to the solid, go to the liquid or go to the gas and you'll die.
Bismuth crystal. I quite like the Mineralogical Record print mag. Lots of good pictures.
Enceladus plumes with anigifs. Cassini just keeps taking these awesome shots of the Saturn system.
Kirkenes harbour (and googling with the name of the ship got me this. Take that for good measure as well.)
Fantoft stave church, PreColombian Mexican architecture
... the deepsea food web is fueled by a rain of dead plants and animals from surface waters. Sediment for the sediment god, let the red clay sink back into the mantle.
Cryosphere Today.
That reminds me: We inhabit a few meter thick layer at the solidgas interface of a 6500 km radius sphere of rock and iron. Go up or down a few meters and you'll die. Go to the solidliquid or gasliquid interface and you'll die. Go to the solid, go to the liquid or go to the gas and you'll die.
Bismuth crystal. I quite like the Mineralogical Record print mag. Lots of good pictures.
Subscribe to:
Posts (Atom)
Blog Archive

▼
2010
(94)

►
September
(16)
 Visitor stats for September 2010
 Composing shaders
 Easy inheritance in JavaScript
 WebGL slides translated
 Saturday Night Links
 Frontend Finland WebGL slides
 Saturday morning link
 Dude!
 SDO video of solar activity
 Arithmetic
 Links for Tuesday
 WebGL color spaces
 Ramble on sorting
 Around the world on the ISS
 Two's complement integers
 The autumn is on its way

►
August
(17)
 Branching factor generator
 And some Tuesday links
 Monday links
 Sunday links
 Folding, part 3: parallel folds
 A small hillclimb optimizer
 Folding, part 2: parallel reduction math
 Folding, part 1
 FormData file uploads
 sse.h update, HTML5 fiddling, more links
 A new blog post!
 Yet more random links
 Random links for another day

►
September
(16)
About Me

Ilmari Heikkinen
 Built art installations, web sites, graphics libraries, web browsers, mobile apps, desktop apps, media player themes, many nutty prototypes