art with code


Visitor stats for September 2010

Browser market share

43% Firefox
31% Chrome
10% Safari
10% Explorer
4% Opera

OS market share

56% Windows
22% Linux
19% OS X
1.6% iOS
0.7% Android

More than 90% of visitors had a screen width of 1280 or higher.

94% had Flash, 1% had a Flash version other than 10. 85% had Java.

Most search engine visitors wanted to know about SSE matrix multiplication. Clearly a hot topic.


Composing shaders

I haven't found a good way to do modular GLSL shaders, so here's some blather on that subject.

Shaders themselves are strings of source code, compiled to a binary by the OpenGL drivers. A shader has a void main() that is called to execute the shader. There are also global definitions in the form of attributes, varyings and uniforms. And a precision qualifier like precision highp float;, and C preprocessor macros.

A shader also has a type, it can either be a vertex shader or a fragment shader (or a geometry shader). The difference between the two is that a vertex shader determines what areas to consider for modification, whereas a fragment shader determines the output value for each element in the modified areas. Imagine you're drawing a filled circle: the vertex shader would output the bounding box of the circle and the fragment shader would test each element in the bbox to see if it lies inside the circle, setting them to the fill color if they do.

But back to the composing part. If you have some common helper functions that you'd like to use in your different shaders (e.g. defaultTransform(), blend functions), how do you reuse them? I tried to link different shader objects together but it didn't work for whatever the reason. So now I'm just concatenating strings.

ShaderSource = Klass({
initialize: function(type, source) {
this.type = type;
this.text = toArray(arguments).slice(1).map(function(s) {
return s.text ? s.text : s;

Maybe you want to further split your shaders into functional segments and turn those segments on and off for different uses. One way is to put some #ifdefs in the shader and prepend a block of #defines to your source.

ShaderSource = Klass({
initialize: function(type, source) {
this.type = type;
this.text = this.origText = toArray(arguments).slice(1).map(function(s) {
return s.text ? s.text : s;

setDefines : function(defines) {
this.defines = defines;
var defs = [];
for (var i in defines) {
defs.push("#define "+i+" "+defines[i]);
this.text = defs.join("\n");

Another way would be to affix an identifier to the source strings and only use a string if it's enabled. This is like writing a feature-poor preprocessor by yourself.

ShaderSource = Klass({
initialize: function(type, source) {
this.type = type;
this.segments = toArray(arguments).slice(1);
this.text = {
return s.text ? s.text : s;

setEnabled : function(enabled) {
this.enabled = enabled;
this.text = {
if (! || enabled[])
return s.text ? s.text : s;
return "";

setDisabled : function(disabled) {
this.disabled = disabled;
this.text = {
if (! || !disabled[])
return s.text ? s.text : s;
return "";

Or some mix of the above. I don't really know what's the super-best way to do this.

Easy inheritance in JavaScript

JavaScript inheritance is cumbersome. You have a constructor function and it has a prototype. And inheritance is done by setting the prototype to a newly created parent object? What? That means that you better not do anything in the constructor, otherwise you'll have a royal pain deriving new prototypes from it.

So I did what I do and wrote my own inheritance system, based on mixins. It still functions pretty much like the JavaScript system, but is more convenient for building the constructor and the prototype from existing classes.

Here's the infamous OO geometry demo:

Rectangle = Klass({
initialize : function(w, h) {
this.w = w;
this.h = h;

getArea : function() {
return this.w * this.h;

Square = Klass(Rectangle, {
initialize : function(s) {, s, s);

Moving = Klass({
mass : 1,

initialize : function(n) {
this.s = new Array(n);
this.a = new Array(n);
this.v = new Array(n);
this.f = new Array(n);
for (var i=0; i<n; i++){
this.s[i] = this.a[i] = this.v[i] = this.f[i] = 0;

move : function(t) {
for (var i=0; i<this.s.length; i++) {
var a = this.a[i], s = this.s[i], v = this.v[i], f = this.f[i];
var newA = f/this.mass;
var newV = v + a*t + (1/2)*(newA-a)*t*t;
var newS = s + v*t + (1/2)*a*t*t + (1/3)*(newA-a)*t*t*t;
this.a[i] = newA;
this.v[i] = newV;
this.s[i] = newS;
this.previousMass = this.mass;

Printable = {
valueString : function() {
var a = [];
for (var i in this)
if (typeof this[i] != 'function') a.push(i);
for (var i=0; i<a.length; i++)
a[i] = a[i] + ": " + this[a[i]];
return a.join(", ");

print : function() {

MovingRectangle = Klass(Rectangle, Moving, Printable, {
density : 0.19,
height : 1,
initialize : function(w,h,d) {,w,h);,2);
if (d)
this.density = d;
this.mass = this.getVolume() * this.density;

getVolume : function() {
return this.getArea() * this.height;

Klass is a function that builds a constructor. The returned constructor calls its prototype's initialize method.

The constructor's prototype is built by merging together all the arguments given in the Klass call. If an argument has a prototype property defined, the prototype is used for merging. Otherwise the argument is merged as itself.

The built prototype is then finally merged into the constructor function, so that you have access to the prototype functions and properties from the constructor object itself. Which is useful for doing calls to other classes, as you can do e.g. instead of

Klass = function() {
var c = function() {
this.initialize.apply(this, arguments);
c.ancestors = toArray(arguments);
c.prototype = {};
for(var i = 0; i<arguments.length; i++) {
var a = arguments[i];
if (a.prototype) {
Object.extend(c.prototype, a.prototype);
} else {
Object.extend(c.prototype, a);
Object.extend(c, c.prototype);
return c;

Object.extend merges the src object's attributes with the dst object, ignoring errors. Ignoring the errors is useful in merging native objects (e.g. styles) as they've got some stuff that can't be overridden.

Object.extend = function(dst, src) {
for (var i in src) {
try{ dst[i] = src[i]; } catch(e) {}
return dst;

toArray creates a new array from an object that has a length property. Useful for dealing with all the silly Array-like objects such as the function arguments object and the lists returned by the DOM API.

toArray = function(obj) {
var a = new Array(obj.length);
for (var i=0; i<obj.length; i++)
a[i] = obj[i];
return a;


WebGL slides translated

I translated my WebGL slides to English and put them on GitHub Pages. Which is great, btw. Upgraded my account to Micro just to give them props for that.

If you would like to buy an app to make fancy WebGL slideshows, send me a mail and I'll get back to you.

If you'd like to subscribe to a web-based zoomable file management & sync service instead, send me a mail and I'll get back to you.

In other news, I'm back on Twitter.


Saturday Night Links

ARTtech 2010: Fairlight's rendering secrets from AssemblyTV. The render graph approach is interesting. And the demo editor.

Yay! SVG in IMG coming in Firefox. Should enable fancy scalable UI images and easy SVG textures in WebGL. (Edit: In 2010-09-18 nightly SVG in IMG works but can't be drawn to canvas or uploaded as WebGL texture.)

Creature workflow in Mudbox 2011.

Auto-retopology in 3D Coat. Take a high-poly model, paint over areas that you want at high resolution, draw lines to suggest edgeloop direction, done!

Maybe one day 3D modeling won't require thinking about polygons. You'd just have a high-resolution form that you deform and then save a low-res version for interactive use. I dunno. The whole "create box, extrude, cut, unwrap, texture"-flow turns me off. It's like you're doing reverse origami: you start with a paper box, then you extrude it, bend it, cut bits off, glue bits back on, cut it open and splay it flat, draw a picture on it, put it back together, stuff a wire model inside, animate. And all the time you're worrying if you got the bends just right so that the joints don't look stupid.


Frontend Finland WebGL slides

I gave a talk at the Frontend Finland meetup about WebGL. Had a great time, thanks to the organizers! My presentation slides are available here in glorious Finnish. Best experienced with a browser that has a working WebGL implementation (fear not, there's a fallback to plain unadorned HTML). Writing the slideshow app (and the slides) took about three days, with four days of preliminary library work (and the app dev fed into the library too).

Here are some links on the topics covered:

The git repo for the utility library and the presentation can be found here.

The SVG presentation by Matti Paksula was really nifty as well. After the show, we were brainstorming about some kind of a super slideshow where you'd write the slides in HTML (with a nice CSS), then they'd get rendered using SVG / Canvas / WebGL depending on the capabilities of your browser. (What my presentation is doing is parsing the DOM into scene graph nodes, then rendering them using WebGL. Doing a similar system that rendered using SVG wouldn't be too difficult.)

Preserving the clickability of links and other such webness would be problematic with WebGL, but a mix of SVG, CSS 3D and WebGL might get you the best of both worlds: overlay SVG elems on top of the WebGL canvas, change the CSS 3D transform of the elems to match the WebGL scene graph. That'd be a lot like CAKE's mixed HTML/Canvas, except in 3D. If you want fancy FX on your SVG elems while preserving interactivity, you'd render them to textures and put the original element at opacity 0.01 on top of the fancy render, maybe? Or do shaders for SVG?



You have things, some things. Call these one-things. There is also a no-thing, call it 0. When you add one-thing to 0, we call that 1. Add a one-thing to 1 and you get 2. Add one-thing to 2 and you get 3. In that way we build a small list of numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

When you add one-thing to 9, we don't have a number for that here. But let's call it a ten-thing and mark it on left of the one-things. We have 1 ten-thing and 0 one-things, written in short as 10. Then we can add a one-thing to the 0 one-things to get 11. When we add a one-thing to 19, we get two ten-things: 20. When we have ten ten-things, we call that a hundred-thing and write it as 100: one hundred-thing, zero ten-things, zero one-things.

In a similar fashion, we could have a shorter list of numbers, say, 0, 1, 2, 3, 4. Now when you add one-thing to 4, you get one five-thing and zero one-things, which we can write as 105. When we have five five-things, we call that a twenty-five-thing and write it as 1005.

There are also negative things. When a negative thing is added to a positive thing, they both disappear. Adding one negative thing to one positive thing makes both disappear, leaving you with zero things. If you add 5 negative things to 8 positive things, the 5 negative things and 5 positive things disappear, leaving 3 positive things. Adding 2 negative things to 3 more negative things gets you 5 negative things. Adding 4 negative things to 2 positive things makes the 2 positive things and 2 negative things disappear, leaving 2 negative things.

When you add 5 negative one-things to 2 positive ten-things, you need to crack open one of the ten-things to get ten positive one-things. The 5 negative things and 5 positive one-things disappear, leaving you with 1 positive ten-thing and 5 positive one-things: 15.

To add 5 negative one-things to 1 positive hundred-thing, you first crack open the hundred-thing to get ten ten-things, then crack open one of the ten-things to get ten one-things. The 5 negative one-things and 5 positive one-things disappear, leaving you with 9 positive ten-things and 5 positive one-things: 95.

Adding a zero to a number results in the number itself. Adding 5 to 0 results in 5. Adding 0 to 5 results in 5.

Negating a number turns a positive number into a negative number and a negative number into a positive number. Positive 2 negated is negative 2. Negative 2 negated is positive 2. Adding a number to its negation is zero. Positive 2 added to negative 2 is zero.

We write positive numbers with a + in front of them and negative numbers with a - in front of them. Positive 2 is +2. Negative 2 is -2. We write addition with a + between the numbers. The + symbol is called plus. Negative 3 plus positive 2 is -3 + +2. We write negate as -(number). Positive 2 negated is -(+2). Negative 2 negated is -(-2).

Two numbers are equal if they are the same. Equality is written with a =. 2 equals 2 is written 2 = 2 and amounts to true. If the two numbers are not the same, the equality operation returns false. If both sides of the equality operation are identical, the equality operation returns true. 2 = 2 is true. 2 = 3 is false. 1 + 1 = 2 is undecided, rewriting it as 2 = 2 or 1 + 1 = 1 + 1 makes it true.

Subtraction is an operation where the first number is added to the negation of the second number. Subtraction is written with a - between the numbers. The - symbol is called minus. -3 minus +2 is written -3 - +2 and means -3 + -(+2).

Multiplication is an operation where each of the things in the first number is replaced by the second number. Multiplying +3 by -2 replaces each of the 3 things with a -2, giving you +(-2) + +(-2) + +(-2) = -6. Multiplying +15 by +3 requires splitting open the ten-things to replace each of the one-things inside them with a +3, giving you +15 +3-things, which amount to +45 when added together. Multiplying -2 by -5 replaces the two things with -5-things, giving you -(-5) + -(-5), which simplifies to +5 + +5 and amounts to +10. Multiplying +0 by -2 gives you zero -2-things, which amounts to zero. Multiplying -2 by +0 replaces the two things with +0, giving you -(+0) + -(+0) = -0 + -0 = -0.

Fractional things are parts of one-things, much like how one-things are parts of ten-things. Fractional things are defined by how many parts are needed to make up a one-thing. When you need two parts for one-thing, the fractional thing is called a half. When you need three parts, the fractional thing is one-third. With five parts needed, it's a one-fifth. Five one-fifths added together amount to one. Similarly three one-thirds amount to one. Fractional numbers are written as 1/parts. One-fifth is written +1/5. One-third is written as +1/3. Negative one-sixth is written -1/6.

Adding 1/6 to 1/6 results in 2/6. Adding 1/6 to 5/6 results in 6/6, which equals 1. Adding 1/6 to 1 is 1/6 + 6/6 = 7/6. Adding 1/5 to 4/5 results in 5/5, which also equals 1. Adding 1/5 to 1 is 1/5 + 5/5 = 6/5.

Multiplication is written with an x between the numbers. The x is called times. 3 times 5 is written 3 x 5 and means 3 multiplied by 5, amounting to 15.

Multiplying a number by one results in the number itself. 5 x 1 is 5. 1 x 5 is 5.

The inverse of a number multiplied with the number itself results in one. The inverse of a number is the fraction with number parts. Zero does not have an inverse. The inverse of +5 is +1/5. The inverse of -3 is -1/3.

Three times the inverse of three is 3 x 1/3 = 1/3 + 1/3 + 1/3 = 3/3 = 1.

Negative two times the inverse of negative two is -2 x 1/-2 = -(-1/2) + -(-1/2) = 1/2 + 1/2 = 2/2 = 1.

Division is an operation where you multiply the first number with the inverse of the second number. 3 divided by 5 is 3 x 1/5 and amounts to 3/5. 8 divided by 2 is 8 x 1/2 and amounts to 8/2, which adds up to 4.

Division is written with a / between the numbers. 5 divided by -3 is written 5 / -3 and means 5 x -1/3, and amounts to -5/3, or -1 + -2/3.

The allowed moves when manipulating an equation with additions are:
  • Apply associativity rule to two additions: a + (b + c) = (a + b) + c
  • Apply commutativity rule to an addition: a + b = b + a
  • Add the same value to both sides of the equality: a = a <=> a + b = a + b

For an equation with multiplications, the above moves apply as well, but there is an additional move when dealing with a mix of additions and multiplications:
  • Apply distributivity rule: a x (b + c) = a x b + a x c

These are the only allowed structural moves for manipulating equations of additions and multiplications.

Additions can use the following equalities for simplifying expressions: a + 0 = a and a + -a = 0. Multiplications can use a x 1 = a and a x 1/a = 1.

We can build more moves from the above, for example:

I'm playing fast and loose with the associativity in these,
a + b + c = (a + b) + c = a + (b + c),
I'm using whichever is the easiest at any given moment.

Also, multiplication binds tighter than addition:
a x b + c x d = (a x b) + (c x d)

x + b = c | a = a <=> a + b = a + b
x + b + -b = c + -b | a + -a = 0
x + 0 = c + -b | a + 0 = a
x = c + -b | a - b = a + -b
x = c - b
x + b = c <=> x = c - b

a x b = c | a = a <=> a x b = a x b
a x b x 1/b = c x 1/b | a x 1/a = 1 when a != 0
a x 1 = c x 1/b | a x 1 = a
a = c x 1/b | a / b = a x 1/b
a = c / b
a x b = c <=> a = c / b when a != 0

(a + b) x c = (a + b) x c | a x b = b x a
(a + b) x c = c x (a + b) | a x (b + c) = a x b + a x c
(a + b) x c = c x a + c x b | a x b = b x a
(a + b) x c = a x c + b x c

a x 1 = a x 1 | a x b = b x a
a x 1 = 1 x a

a x b = a x b | (a + b) x c = c x a + c x b
a x b = (a + -1) x b + (1 x b) | 1 x a = a
a x b = (a + -1) x b + b | a - b = a + -b
a x b = (a - 1) x b + b

-(a) = -(a) | a = b <=> a + b = a + b
a + -(a) = a + -(a) | a + -(a) = 0
a + -(a) = 0 | a = b <=> a + b = a + b
-a + a + -(a) = -a + 0 | a + -a = 0
0 + -(a) = -a + 0 | a + b = b + a and a + 0 = a
-(a) = -a

-1 x a = -1 x a | 0 + a = a
-1 x a = 0 + -1 x a | a - b = a + -b
-1 x a = 0 - 1 x a | 1 x a = a
-1 x a = 0 - a | a - b = a + -b
-1 x a = 0 + -a | 0 + a = a
-1 x a = -a

-1 x a = -a | -a = -(a)
-1 x a = -(a)

-a x b = -a x b | -1 x a = -a
-a x b = -1 x a x b | a x b = b x a
-a x b = a x -1 x b | -1 x a = -a
-a x b = a x -b

-a x b = -1 x a x b | -1 x a = -(a)
-a x b = -(a x b)

a x (b - c) = a x (b - c) | a - b = a + -b
a x (b - c) = a x (b + -c) | a x (b + c) = a x b + a x c
a x (b - c) = (a x b) + (a x -c) | a x b = b x a
a x (b - c) = (a x b) + (-c x a) | -a x b = -(a x b)
a x (b - c) = (a x b) + -(c x a) | a x b = b x a
a x (b - c) = (a x b) + -(a x c) | a - b = a + -b
a x (b - c) = (a x b) - (a x c)

a x 0 = a x 0 | 0 = -1 + 1
a x 0 = a x (-1 + 1) | a x (b + c) = a x b + a x c
a x 0 = a x -1 + a x 1 | a x 1 = a
a x 0 = a x -1 + a | -1 x a = -a and a x b = b x a
a x 0 = -a + a | -a + a = 0
a x 0 = 0

Now go forth and solve the problems of algebra!

Links for Tuesday

A Blender-WebGL exporter got forked.

Rrrola's FX. Mandelboxes create a lot of cool geometry, riding that line between geometric and organic. I wonder if you could build game levels quickly by CSG cutting mandelboxes.

Some motion graphics videos on YouTube.


WebGL color spaces

There's an interesting discussion happening on the public-webgl mailing list about the color spaces used by the textures and the canvas.

As far as I can tell, the problem is:
  • Simple math works correctly only in a linear color space.
  • Images and the rest of web content is 8 bits per channel sRGB.
  • Converting between two 8-bit color spaces is lossy.
  • You can't tell what an image's color space is from JavaScript.
  • You can't easily convert between color spaces in JavaScript.

The proposed solutions are:
  • Pass image data as-is to WebGL, treat WebGL canvas values as sRGB. (I.e. the do-nothing solution. If application writers want correct math, they should use linear textures and do a final linear-to-sRGB pass on the rendered image.)
  • Optionally convert textures to linear color space, treat WebGL canvas as linear. (Problematic as there is severe information loss at the conversions. Math works correctly though.)

Color spaces? But I thought space was black?

Well, actually, space has no color. It's transparent. It just looks black because there's very little light coming through it at most points.

Color spaces, on the other hand, are definitions of what the numerical pixel values in an image mean. A linear RGB color space means something like "given an output device with three component emitters at wavelengths R=580 nm, G=490 nm and B=440 nm, the number of photons emitted by an emitter is Max_Output * V, where V = component_value / (2^component_bits-1)." For sRGB, the component values are transformed by the sRGB curve (roughly pow(V, 2.2)) to have more control over the important parts of the dynamic range. In particular, sRGB sacrifices precision in the light end to have more control in the darks. Converting linear->sRGB loses detail in the light end, whereas sRGB->linear loses detail in the dark end.

To project an image onto your eyeball, the image data goes through the following wringer: Image data in image color space -> Viewer app color space -> Window system color space -> Display device color space -> Hardware blinkenlichts -> Eyeball.

Now, most software goes "huh?" at the mention of that and acts as if the image data is in the window system color space and uses the data directly. Further, the assumption when doing image editing operations is that the aforementioned color space is linear. (Why that might cause problems is that while 0.25 + 0.25 has the same luminance as 0.5 in a linear color space, to add the luminances in gamma 2.2 you have to do (0.25^2.2 + 0.25^2.2)^(1/2.2) = 0.34.)

Software that requires accurate color handling usually converts the 8-bit values to 16 or 32-bit linear color space to get around the conversion errors and to get easy correct math internally. The resulting high-depth image is then converted down to window system color space for viewing.

The real solution is to use 64 bits per channel and specify the channels to mean photon count / second / mm^2. With that, you can crank from 1 photon / second all the way up to 6500x noon sunlight. And linear math. Then use that as the interchange format between everything. Heyo, massive framebuffers~

In the meantime, maybe the browser could have an iconv-like system for converting images from one color space to another. WebGL 1.0 doesn't have float textures, so upconverting to linear float textures has to wait for the float texture extension.

With linear float textures, you could upconvert all images to float textures, then use a linear float FBO for rendering, and finally convert the FBO to sRGB to display on the canvas element.


Ramble on sorting

Merge sort merges already sorted regions together. It also has an operation to turn a small region into a sorted region. The merge sort splits its input into small regions, turns each of those into a sorted region, and finally merges them together.

There are some sorted regions that have a trivial merge operation: if the largest value in region X is smaller or equal to the smallest value in region Y, the merge operation is the concatenation X ++ Y. We can pre-process the input of the merge sort so that the two merged regions have this property. The pre-processing pass uses a pivot value and moves all values smaller than the pivot to the start of the region, leaving the end of the region for values larger than the pivot. Now you don't have to do anything to merge the regions together once they're sorted. This is the idea behind the quicksort algorithm.

Because pivoting can cause the regions to have different sizes, there is a worst-case scenario where the pivot is the only element in the other region. Because of this scenario, quicksort has O(n^2) worst-case time complexity. If you pick the pivot to be the median of the region with an O(n) median finding algorithm and further split the equal-to-pivot values equally between the subregions, you can get an O(n lg n) quicksort (but the O(n) median finding algorithm has a high overhead, so they say the resulting algorithm is slower than quicksort in the average case).

You can do a O(n) pre-processing pass over the input data to find all already sorted regions in the input data. If the number of found regions is small, you're dealing with nearly sorted data. You can directly merge all sorted regions that are larger than the region size for the region sorting operation. The merging should be done so that the smallest regions are merged together first. Merging regions of unequal size causes the worst-time complexity to increase (merging a 2-region and a X-region takes X+2 time, consider N/4 2-regions merged into a N/2-region: it takes N/4 * (N/2 + (2+N) / 2) time, which is in O(N^2).)

The pre-processing pass can also reverse regions that are sorted in the opposite direction. And it is probably possible to topologically sort the sorted region descriptors by their min-max-values to find regions that can be merged by concatenation. Another merge trick would be to use binary search on one region to find the span that the other region overlaps, then you could get away with memcpy(dst,smaller,smaller_sz); merge(dst+smaller_sz,overlap,region2,overlap_sz,region2_sz); memcpy(dst+smaller_sz+overlap_sz+region2_sz, larger, larger_sz).

If you keep track of how equally the quicksort pivot pass splits a region, you can avoid the quicksort worst-case by handing pathological splits over to merge sort. That is a bit nasty though, as the resulting regions no longer have the concatenative property and must be merged the slow way. Another alternative is to do something like introsort and switch over to heapsort once recursion depth exceeds lg n.

Once merge sort or quicksort reaches the maximum unsorted region size (say, 8 elements), implementations usually switch over to a low-overhead O(n^2) sorting algorithm like insertion sort.

The partitioning operation of merge sort is a no-op, but the partitioning operation in quicksort takes O(n) time. In the same fashion, the merge operation of quicksort is a no-op, but the merge operation of merge sort takes O(n) time.

You can do the merge sort merge in parallel by picking a pivot and splitting the sorted regions into two regions: one with values smaller or equal to the pivot and the other with values larger or equal to the pivot. The splitting operation consists of finding the index of the pivot in each region using a binary search. You then merge the small values with the small values and the large values with the large values. The merge between the merged smalls and larges is concatenation, so you can just allocate an array with the beginning reserved for the small values and the end reserved for the large values, and run the small and large merges in parallel. Recurse to increase parallelism. The nice thing about this approach is that you get segregated in-place parallelism, each of the sub-partitions can write to its own part of the root target array. The problem it suffers from is the quicksort partitioning problem: the pivot should be the median of the data set. Because the sorted regions are sorted, we can get the medians of each region easily enough, the worst case with one of those should be a 25:75-split when the regions are of equal size.

If your input data is 256 elements long and you split it into 8-element insertion-sorted regions, executing each in parallel (on a 32-core), what's the merge time? The first merge level you want to split into 2-way parallel merges (16 merges -> 32 merges). The second level is split 4-way (8 merges -> 32 merges), then 8-way and 16-way? The split bsearch overhead is 2 log2 mergeSize per level (sub-splits execute in parallel).

To do the 16-way split (assuming we get a perfectly balanced split), we first do a 2 log2 128 = 14 op split, then 2 log2 64 = 12 op split, then 10 op and 8 op, for total split cost of 44 ops. The merge then takes 16 ops to run (8+8-merge), for total merge cost of 60 ops. If we stop the split one level short, it'd take 36 ops to split and 32 ops to merge, 68 ops in total. So I guess you do want to split it all the way down if you have negligible overhead.

The total runtime for the zero-overhead parallel merge tree would be 44 + 30 + 18 + 8 + 4*16 = 164 ops. Total sort time with 64-op 8-elem sorts would be 228 ops. Hey, sub-O(n). Quadruple the input size and the core count and it's 8+18+30+44+60+78+6*16 + 64 = 398 ops. Hey, sub-linear parallel scaling.

There is an another way to do parallel merging as well (cribbed from some lecture slides): for regions A and B, write A[i] to i + B.indexOfLarger(A[i]) and write B[i] to i + A.indexOfLarger(B[i]). N-parallelizable, serial O(N log N) time, parallel O(log N) time. However, it relies on scattered writes, so it doesn't seem very memory-access friendly (ideally each CPU core should be working on its own cache line, otherwise you get cache contention and traffic between the CPUs).

Fun reading: Heapsort, Quicksort, and Entropy.


Two's complement integers

Oh hey, I finally get how two's complement works after reading the first couple paragraphs of Why computers represent signed integers using two’s complement.

Simply put, adding a negative number to a positive number overflows if the result is positive, and doesn't overflow if the result is negative.

With 32-bit integers

5 - 1 = 5 + (-1) = 5 + (232-1) = 232+4 = 4
5 - 3 = 5 + (232-3) = 232+2 = 2
5 - 7 = 5 + (232-7) = 232-2 = -2

This is kinda interesting, we can put the separator between positive and negative numbers where we like:

void print_custom_int (unsigned int n, unsigned int separator)
if (n < separator) {
printf("%u\n", n);
} else {
printf("-%u\n", ~n+1);

print_custom_int(0, 200);
// 0
print_custom_int(1, 200);
// 1
print_custom_int(-1, 200);
// -1
print_custom_int(200, 200);
// -4294967096
print_custom_int(-4294967097, 200);
// 199

Or maybe you need extra positive range and only 256 negative values:

print_custom_int(4000000000, 0xFFFFFF00);
// 4000000000
print_custom_int(2*2004001006, 0xFFFFFF00);
// 4008002012
print_custom_int(-32-80+15, 0xFFFFFF00);
// -97

// Watch out for overflows!
print_custom_int(-16*17, 0xFFFFFF00);
// 4294967024

And yeah, division doesn't work out of the box.

unsigned int div_custom_int (unsigned int a, unsigned int b, unsigned int separator)
if (a < separator && b < separator)
return a / b;
else if (a < separator)
return ~(a / (~b+1))+1;
else if (b < separator)
return ~((~a+1) / b)+1;
return (~a+1) / (~b+1);

unsigned int n20 = -20, n3 = -3, p4 = 4, p2 = 2;

print_custom_int(n20 / p4, 0xFFFFFF00);
// 1073741819
print_custom_int(p4 / n3, 0xFFFFFF00);
// 0
print_custom_int(p4 / p2, 0xFFFFFF00);
// 2
print_custom_int(n20 / n3, 0xFFFFFF00);
// 0

// vs.

print_custom_int(div_custom_int(n20, p4, 0xFFFFFF00), 0xFFFFFF00);
// -5
print_custom_int(div_custom_int(p4, n3, 0xFFFFFF00), 0xFFFFFF00);
// -1
print_custom_int(div_custom_int(p4, p2, 0xFFFFFF00), 0xFFFFFF00);
// 2
print_custom_int(div_custom_int(n20, n3, 0xFFFFFF00), 0xFFFFFF00);
// 6

Blog Archive

About Me

My photo

Built art installations, web sites, graphics libraries, web browsers, mobile apps, desktop apps, media player themes, many nutty prototypes