fhtr: 2009

2009-12-17

3D models and parsing binary data with JavaScript

Background

A few days ago, I was fiddling around with loading 3D models for use with WebGL (here demos). I had an OBJ parser and some test models lying around from an earlier project, and the parser was only a two hundred lines, so I ported it over to JavaScript. But the OBJ files were pretty huge, a 40-thousand quad model without normals or texcoords was 2.4 megs. Not really something you want to send over the internet.

The OBJ file format stores coordinates and face indexes as text. While it makes the model files human-readable and easy to parse, it also increases the file size. Uncompressed, a single coord takes 9 bytes, plus an optional sign byte.

My first instinct was to just use gzip compression on server-side. And it did manage to bring the file size down to 880k for the above-mentioned 2.4 meg OBJ. Still. Maybe there would be more gains to be had by storing the coords and face indexes as binary data.

I went for [0,1]-normalized 16-bit fixed-point numbers for the coordinates and 16-bit unsigned ints for the face indexes. I also had to store the vertex and face counts, for which I used 32-bit unsigned ints (which, come think of it, is a bit silly, as using 16-bit face indexes limits the vert count to 65536). Finally, I added two three-float vectors for coord scaling and translation, in order to inflate the normalized coords back to something close to their original values.

Result: 560k for uncompressed binary file, 470k gzipped. Nice. Though it could be brought down further by using a triangle strip instead of separate quads and tris. And there are fancy 3D mesh compression papers that talk about compressing down to less than 2 bits per triangle (for the example model: 40kquads -> 80ktris = 160kb = 20kB). But enough of that for now. Next up, how to parse the binary data.

Parsing binary data

To parse a binary string with JavaScript, you use charCodeAt and the usual bit munging operators (&, |, <<, >>). First, you need the byte value at some point in the string. As charCodeAt operates on characters and not bytes, you need to tell the browser to get the data as an unparsed string by calling overrideMimeType("text/plain; charset=x-user-defined") on your XMLHttpRequest. Then mask away the high byte of each char code to get the low byte value. The following works at least on Firefox and WebKit, I don't know about IE and Opera.


  var byteValue = str.charCodeAt(index) & 0xFF;

Loading unsigned ints is simple enough, just load the bytes for the int and shift them to their places:


  // read big-endian (network byte order) unsigned 32-bit int from data, at offset
  readUInt32 = function(data, offset) {
    return ((data.charCodeAt(offset) & 0xFF) << 24) +
           ((data.charCodeAt(offset+1) & 0xFF) << 16) +
           ((data.charCodeAt(offset+2) & 0xFF) << 8) +
           (data.charCodeAt(offset+3) & 0xFF);
  }
  // read big-endian (network byte order) unsigned 16-bit int from data, at offset
  readUInt16 = function(data, offset) {
    return ((data.charCodeAt(offset) & 0xFF) << 8) +
           (data.charCodeAt(offset+1) & 0xFF);
  }

Reading in normalized fixed-point numbers as floats isn't much harder either, as they're generated with uint16(normalized_float * 65535):


  readNormalizedUFixedPoint16 = function(data, offset) {
    return readUInt16(data, offset) / 65535.0;
  }

Floats, on the other hand, are a bit more involved. Writing the float parser was educational, now I have a better understanding of the buggers! The following code doesn't handle the special numbers and likely loses precision, is slow, etc., so if you have a better way of parsing binary floats, please share.


  // read big-endian (network byte order) 32-bit float
  readFloat32 = function(data, offset) {
    var b1 = data.charCodeAt(offset) & 0xFF,
        b2 = data.charCodeAt(offset+1) & 0xFF,
        b3 = data.charCodeAt(offset+2) & 0xFF,
        b4 = data.charCodeAt(offset+3) & 0xFF;
    var sign = 1 - (2*(b1 >> 7));                     // sign = bit 0
    var exp = (((b1 << 1) & 0xff) | (b2 >> 7)) - 127; // exponent = bits 1..8
    var sig = ((b2 & 0x7f) << 16) | (b3 << 8) | b4;   // significand = bits 9..31
    if (sig == 0 && exp == -127)
      return 0.0;
    return sign * (1 + sig * Math.pow(2, -23)) * Math.pow(2, exp);
  }

And there you have it. I haven't gotten around to parsing signed integers, though I guess subtracting 1 << (bits-1) from the unsigned value would do the trick.

2009-11-28

JavaScript drawing apps

Wrote two small drawing apps during the last two days:

WebGL cube paint - draw with fancy 3D cubes.

MiniPaint - a 2D canvas drawing app in 25 lines.

The 25-line app has Wacom support (pressure, device type, tilt). Too bad that the only copy of Firefox with Wacom support is mine. (Yes, created bugs in Bugzilla and sent patches. Wait and see. BTW, if you know how to add Wacom support for Windows and Mac, that'd be awesome.)

2009-11-23

WebGL cubes

Started writing a small scene graph last weekend. Here cube shots from test demo:

I wrote and optimized one use-case for it as well! "Create bouncing cube on canvas (with lighting etc.)"

Ended up with the special-case-rife DWIM snippet: new Scene($('gl_canvas'), new Cube().makeBounce()).

Demos: lots of cubes and a simple bouncing cube.

2009-11-16

Further optimizing a small JavaScript loop

Continuing from Optimizing a small JavaScript loop, there were some suggestions in the comments on how it might be made faster (or cleaner.) I tested them all just now, along with a couple ideas of my own.

Using `var l = imgs.length; while (l--)` instead of a for-loop

No measurable effect on the speed of my loop, except that I had to reverse a sort order to account for the reverse traversal. And I don't like reversing sort orders, it's hard enough to keep these things straight when you're not counting backwards.

Not copying function args to local vars

I.e. instead of the redundant var ctop = arg2, use arg2 directly. Made the code cleaner. Didn't have any measurable effect on runtime.

Using object properties directly instead of copying them to local vars

I.e. instead of var etop = e.cachedTop use e.cachedTop directly. Made the code cleaner, no effect on runtime.

Using `imgs[i]` everywhere instead of `var e = imgs[i]`

That seems slower to me. And more typing. But, no measurable effect on runtime.

Using `a > b ? a : b` instead of `Math.max(a,b)`, ditto for `Math.min`

This was one of my ideas. No measurable effect on runtime.

Inlining isVisible to the loop

Again, one of my ideas. Again, no measurable effect on runtime. Maybe a 5-10% improvement, but my measurement error is 10-20% :|

Moving setting `e.trans` outside the loop

Sounds like a good idea. Need to refactor my code to do that.

Conclusions

Most of the things people tell you will improve JavaScript performance actually don't. Measure.

Also, I need a better way to measure runtimes. What I have now is a keydown handler that runs the loop a hundred times and prints the elapsed time to Firebug console. The amusing thing is that the exact same code can give 100-run runtimes ranging from 550 ms to 720 ms. And my system load is zero.

I suppose the problem is that this benchmark is run as part of my existing application. So, if any enterprising soul wants to have a go, make the bit you benchmark a separate page.

2009-11-14

Optimizing a small JavaScript loop

I recently had to optimize a visibility-checking loop, as it was causing some slowness when dealing with several hundreds of images. Executive summary: profile with Firebug, cache regex results and unchanging DOM properties.

I started with a loop like this:


imgs.forEach(function(e) {
  if (isVisible(e)) {
    if (e.src.match(/\/transparent\.gif$/)) {
      e.src = e.getAttribute('origsrc');
    }
  } else if (!e.src.match(/\/transparent\.gif$/)) {
    e.src = '/transparent.gif';
  }
});

isVisible = function(e) {
  var ctop = window.scrollY;
  var cbottom = ctop + window.innerHeight;
  var etop = elem.offsetTop;
  var ebottom = etop + elem.offsetHeight;
  var top = Math.max(ctop, etop);
  var bottom = Math.min(cbottom, ebottom);
  return (bottom - top) > -1200;
}

Runtime for 400 images: 35ms. Gah!

Profile says that a lot of time is spent in isVisible. But there's really nothing there that looks expensive. Let's try caching window.scrollY and window.innerHeight anyhow.


var wsy = window.scrollY;
var wih = window.innerHeight;

imgs.forEach(function(e) {
  if (isVisible(e, wsy, wih)) {
...

isVisible = function(e, wsy, wih) {
  var ctop = wsy;
  var cbottom = ctop + wih;
...

Runtime came down to 23 ms. Whoah, that's a third off the execution time. Maybe property access is super expensive, let's see if caching elem.offsetTop and elem.offsetHeight would help as well.


isVisible = function(e, wsy, wih) {
  ...
  if (e.cachedTop == null || !e.complete) {
    e.cachedTop = e.offsetTop;
    e.cachedBottom = e.cachedTop + e.offsetHeight;
  }
  var etop = e.cachedTop
  var ebottom = e.cachedBottom
  ...
}

Runtime down to 15 ms. Excellent.

Profile says that the forEach loop is using a lot of time. Maybe it's the regexes. How about using RegExp.test instead of match. The docs say it should be faster.


    if ((/\/transparent\.gif$/).test(e.src)) {
...
  } else if (!(/\/transparent\.gif$/).test(e.src)) {

Runtime down to 12 ms. RegExp.test is indeed faster than String.match. Next, we could do away with the regexp altogether as we control which images are transparent and which are not...


imgs.forEach(function(e) {
  if (e.trans == null)
    e.trans = (/\/transparent\.gif$/).test(e.src);
  if (isVisible(e, wsy, wih)) {
    if (e.trans) {
      e.src = e.getAttribute('origsrc');
      e.trans = false;
      e.cachedTop = e.cachedBottom = null;
    }
  } else if (!e.trans) {
    e.src = '/transparent.gif';
    e.trans = true;
    e.cachedTop = e.cachedBottom = null;
  }
});

Okay, now we're down to 5 ms. Finally, let's try replacing the forEach-loop with a plain for-loop.


for (var i=0; i<imgs.length; i++) {
  var e = imgs[i];
  ...
}

Down to 3 ms. Not all that much in absolute terms, but relative to 5 ms it's 40% less. I wonder if CRAZY MICRO-OPTIMIZATIONS would do anything.


for (var i=0, l=imgs.length, e=imgs[0]; i<l; e=imgs[++i]) {
  ...
}

Down to ... 3 ms. No change. Well, it might be 0.3 ms faster but my measurements have a larger error than that. Ditch the micro-optimizations for readability.

And 'lo! The final code that is 10x faster than what we started with:


var wsy = window.scrollY;
var wih = window.innerHeight;

for (var i=0; i<imgs.length; i++) {
  var e = imgs[i];
  if (e.trans == null)
    e.trans = (/\/transparent\.gif$/).test(e.src);
  if (isVisible(e, wsy, wih)) {
    if (e.trans) {
      e.src = e.getAttribute('origsrc');
      e.trans = false;
      e.cachedTop = e.cachedBottom = null;
    }
  } else if (!e.trans) {
    e.src = '/transparent.gif';
    e.trans = true;
    e.cachedTop = e.cachedBottom = null;
  }
}

isVisible = function(e, wsy, wih) {
  var ctop = wsy;
  var cbottom = ctop + wih;
  if (e.cachedTop == null || !e.complete) {
    e.cachedTop = e.offsetTop;
    e.cachedBottom = e.cachedTop + e.offsetHeight;
  }
  var etop = e.cachedTop
  var ebottom = e.cachedBottom
  var top = Math.max(ctop, etop);
  var bottom = Math.min(cbottom, ebottom);
  return (bottom - top) > -1200;
}

Does it work right? It seems to, at least. So maybe! Or maybe not! Isn't caching fun?

[Edit] The offsetTop and offsetHeight caching didn't work on my case, so I had to remove them. Final runtime 11 ms, or only 3x faster. Maybe some day I figure out why it didn't work...

[Edit] People have suggested using var l = imgs.length; while (l--) { ... }, which not only looks odd and reverses the iteration order, but also generates the same code as a for-loop would (on WebKit at least), according to Oliver Hunt. Well, the codegen test is for var i=0; while (i < imgs.length) i++; but i assume it also applies for for (var l=imgs.length; l--;).

[Edit] It's also somewhat strange to see suggestions that focus on optimizing all these weird things like "use object properties directly instead of local variables", instead of the possibly big things: inline isVisible, replace Math.max and Math.min with a > b ? a : b and a < b ? a : b. Well. I'll try them all and post a follow-up. Stay tuned!

[Edit] Follow-up.

2009-11-11

Canvas-generated hexa background

Wrote that for the new fancy Metatunnel page (while procrastinating on writing an essay...) Here's the JS to boggle over:


function polygon(n) {
  var a = [];
  for (var i=0; i<n; i++)
    a.push({x:Math.cos(2*Math.PI*(i/n)), y:Math.sin(2*Math.PI*(i/n))});
  return a;
}
function id(i){ return i }
function lineTo(ctx){return function (p){ ctx.lineTo(p.x, p.y) }}
function linePath(ctx, path){ path.map(lineTo(ctx)) }
function drawPath(ctx, path, close){ 
  ctx.beginPath(); linePath(ctx, path); if (close) ctx.closePath() }
function strokePath(ctx, path, close){ drawPath(ctx, path, true); ctx.stroke() }
function fillPath(ctx, path){ drawPath(ctx, path); ctx.fill() }
function scale(x,y,p){return {x:p.x*x, y:p.y*y}}
function scalePoint(x,y){return function(p){ return scale(x,y,p) }}
function translate(x,y,p){return {x:p.x+x, y:p.y+y}}
function translatePoint(x,y){return function(p){ return translate(x,y,p) }}
function scalePath(x,y,path){ return path.map(scalePoint(x,y)) }
function translatePath(x,y,path){ return path.map(translatePoint(x,y)) }
function clonePath(path){ return path.map(id) }
 
function wavePoint(p) {
  var np = {
    x: (p.x + 1.5*Math.cos((p.x+p.y) / 15)),
    y: (p.y + 1.5*Math.sin((p.x+p.y) / 15))
  }
  var d = Math.sqrt(np.x*np.x + np.y*np.y);
  var s = Math.pow(1/(d+1),0.7);
  return translate(5, 24, scale(s*30, s*30, np));
}
function wavePath(path){ return path.map(wavePoint) }
function drawbg() {
  var canvas = document.createElement('canvas');
  canvas.width = 1400;
  canvas.height = 700;
  var ctx = canvas.getContext('2d');
  ctx.fillStyle = '#333';
  ctx.fillRect(0,0,canvas.width,canvas.height);
  var g = ctx.createRadialGradient(50,50,0, 50,50,1400);
  g.addColorStop(0,'rgba(0,192,255,1)');
  g.addColorStop(1.0,'rgba(255,0,255,1)');
  ctx.fillStyle = g;
  var hex = polygon(6);
  var s = Math.sin(Math.PI/3);
  var c = Math.cos(Math.PI/3);
  for (var i=-2; i<120; i++) {
    for (var j=0; j<8; j++) {
      if (Math.random()+Math.random() < Math.pow((Math.sqrt(i*i+j*j) / 55),2)) continue;
      var x = i*(1.2+c);
      var y = j*(2.4*s) + (i%2 ? 1.2*s : 0);
      fillPath(ctx, scalePath(10, 7, wavePath(translatePath(x,y,hex))));
    }
  }
  document.body.style.background = '#333 url('+canvas.toDataURL()+') no-repeat top left';
}

2009-11-10

Metatunnel WebGL now works on WebKit as well

Made the Metatunnel demo WebGL port work on WebKit as well, tested to work with the latest Chromium Linux build.

2009-11-07

Done with canvas3d-tests WebGL port

As far as I know, the canvas3d-tests are now all ported to WebGL. However, I'm using a mix of GLSL 1.20 and 1.30 in the tests, which probably isn't a good idea. Porting all shaders to 1.20 or whatever GLSL ES uses as its base would be a good next step.

Hmm, I have an apigen in the git repo that generates GL constant -checking C++ wrappers for OpenGL functions. That could be hacked a bit to make it create better autotests.

The biggest issue with the current tests is that they only cover a small portion of the API, as I focused on writing only evil tests manually. The rest of the API gets banged by the API fuzzer to see if anything crashes. Evil tests then are me trying to cause a segfault or read back information I shouldn't be able to, and for general array indexing validation. I wrote them for things that do array access, read back data from OpenGL, or that seem iffy in some other way.

So. TODO: Port shaders to 1.20, better autotests, file bugs.

2009-11-05

Started porting Canvas3D tests to WebGL

I started porting Canvas3D tests over to WebGL this morning (apparently insomnia helps.) I've been intending to do that since the summer, but never really got started. Better late than never, eh.

I've been testing the Firefox implementation. If there's a way to test WebKit's WebGL on Linux, I'd be most interested in hearing that.

The Firefox implementation has changed somewhat from the last time I worked on it, so I found some new bugs too :) BindFramebuffer is checking its argument against wrong constants, Tex[Sub]Image2D is quite broken for non-image arguments and some functions don't bork on bad data. Plus a fuzzer segfault and an on-exit segfault (didn't trace those yet.)

I still have around half the existing tests to port... Something to do this afternoon.

2009-11-04

Visitor stats for October

About 1250 visits.

Browser market share: 50% Firefox, 15% Explorer, 14% Chrome, 12% Safari and 6% Opera.

OS market share: 56% Windows, 22% Linux, 20% Mac, 1% iPhone.

The smallest screen resolution was 170x180. The largest was 3840x1024. Most people had something between 800x480 and 1920x1200.

Visitors by continent: 44% Europe, 33% North America, 12% Asia, 6% South America, 3% Oceania, 1% Africa.

By country: 28% US, 7% France, 7% Germany, 5% UK, 5% Canada, 4% Brazil, 3% Japan, 3% Finland, 3% Spain, 3% India.

Hmm...

2009-10-30

Hook Champ tips

Played way too much Hook Champ last week. It's a racing game disguised as a grappling hook platformer. Made by some friends of mine. So let me tell you how to get decent scores.

The fast way

First off, buy all the equipment upgrades. Then, find all the treasures in the map. A coin is worth two seconds. A jewel is worth ten seconds. If you see a stalactite hanging from the roof, that's a sign that you should go down there. Check out suspicious chimneys with rocket boots.

Don't hit the walls - they kill your horizontal speed. Sometimes you can't avoid it though, but those places are mostly long drops and chimneys.
Use the shotgun to accelerate - good places for that are at the start of the level and after upwards chimneys. It doesn't help as much in downwards chimneys because you can convert the downward motion to forward motion.
Don't hit the floor when swinging, it slows you down.
Use the rocket boots instead of trampolines, that way you don't have to go down to touch the trampoline. In multi-rocket jumps, don't wait for your upwards motion to stop before rocketing again.
Don't hit the ceiling when swinging, it slows you down.
Don't swing too high, it slows you down. In straight corridors, release at around the time the rope points straight down.

When you are breaking through walls, you want to hit between two blocks. If you hit only one block, you'll likely stand up on the block below and that stops your motion.

If you manage to get a coin to follow you, you can use it to see what is fast and what is slow. If the coin catches up to you, you went slow. If you outrun the coin, you're going at a good speed.

Zummm, that is all. Been working on our crazy web app project this week, to the detriment of other projects. Such is life.

2009-10-19

CAKE documentation

I've been putting together the CAKE documentation wiki yesterday and today. My process for documenting a class is to copy-paste the whole class in the wiki editor, delete the code, and reformat the comments a bit. Maybe it'd be useful to leave the code visible in the wiki pages, opinions?

Looking at CAKE, the animation stuff needs to be better and the sound system reworked some. And I want a graphical editor with timelines and everything. And a pony.

And maybe a barebones WebGL renderer for it. It'll still get pwned by Firefox's compositing system, but at least it'll get there faster. Maybe?

2009-10-11

Less whining, more homeworking!

2009-10-05

EU MozCamp 09 sketches

Tristan Nitot preparing for his welcome keynote.

Welcome keynote fox banner.

Drumbeat logo was pretty cool.

Mike Beltzner on Saturday keynote.

2009-10-04

EU MozCamp 2009

Let's try writing something non-trollish for a change, hm?

So, EU MozCamp 2009. I was there to participate in the HTML5 roundtable (and to show off my new canvas animation demo (~500kB), view source for headache.) It was great visiting Prague and meeting all the web people there, thanks everyone!

Cool stuff: DailyMotion SVG video filters demo, Stratified JavaScript - JavaScript with concurrency primitives, HTML5 location API demos, boat trip on the Vltava. Massive amount of expats present (also massive amount of French.) I think I met people from Bulgaria, Czech, Estonia, France, Germany, India, Ireland, Netherlands, Portugal, Romania, Spain, Sweden, UK and US...

I have also managed to plow through around fifteen pages of my newly-bought sketchbook, mostly at the airport and in the plane. Clouds be fabulous from above.

Catching my flight home in a couple of hours, feeling pretty tired. I'll add some links to this post on reaching home.

[edit] Added links.

2009-09-22

Mobile internet speed beats ADSL

Upstream-wise, that is. The Nokia CS-15 HSUPA wireless Internet USB stick has a maximum 10.2 Mbps downstream bandwidth and a 5.76 Mbps upstream bandwidth. You need fiber to beat that upstream, at least here in Finland.

And a Huawei stick doubles the downstream to 21 Mbps. If the latency is low enough and the network infra can take the load, that'd totally kill ADSL.

2009-09-21

Also on the INTERNETS

The final solution to piracy

Books

Ditched the sediment & river books from my reading queue, as the library wanted them back. Replaced them with two writing course -related book-like objects about simulated annealing, in addition to An Introduction to the World's Oceans (hey, 71% of the planetary area can't be wrong, right?) and Climatology - An Atmospheric Science (ooh, a pretty painting on the cover.) Both being introductory textbooks to their subjects.

The oceanography book gets extra points for the oceanography department probably having coated their whole library in tar, supposedly to protect it from the vagaries of the high seas. As a result, there's a smell to the book. There were also a few books from late 1800s and early-to-mid 1900s in the oceanography shelf, which I briefly looked at before putting them back to the shelf, carefully. If something survives a hundred years of students, it must be lucky!

I'm reading these non-CS related textbooks because CS gives me a headache and a neckache and a backache and they're actually interesting. Plus maybe I'll be able to use the knowledge gleaned as a way to slither my way into an outdoors IT job... Oh wait, the plan was to totally spruce up my drawing skills (a couple hours of daily practice actually works, whuda thot) and get an art/graphics job. Well, at least they have lots of pictures to draw.

Maybe a couple hours of daily practice would work for Swedish grammar as well. There's a thought.

Today's quote

"it isn't right for children to steal words"

2009-09-16

Back to programming

I've been hacking a bit on cakejs and canvas recently, making a paper doll animation system. The paper doll -part was actually pretty easy, but now I need tools for making the animations. Which might be less easy. Let's see :)

Also, cakejs is really lacking in the documentation & tutorials -department. I will try and fix that by pasting the code comments to wiki pages. Tutorials might be easy to write by taking the demo page scripts and adding narration.

Still sustaining a drawing routine of 1-2 A4 pages per day, which is awesome. Plein air drawing is a lot of fun and sunshine, and works great as practice. What I should also do is grab a book on machine illustration or some other really constructive ID method of drawing and work through the exercises.

Jogging has plateaued at three one-hour runs per week. I don't know if I should try to push it beyond that as I seem to get knee ache at around the 10 km mark. Maybe it's caused by bad technique? Fix by muscle training to sustain a better posture and lots of intervals to improve running technique?

Added shutdown -h now to root's crontab at one-hour intervals in the evening to boot me off the computer and go to sleep. And... it's working. Two weeks of <9am wakeups already. I had the tendency to latch onto a 25-26h daily rhythm before, which is really not conducive to getting to Swedish class at 9:15am.

2009-09-08

Reading list summer 2009

Started off with some scifi, three novels by Iain M. Banks from his Culture series: Consider Phlebas, Player of Games and Matter, and David Petersen's book of fighting mice; Mouse Guard: Fall 1152. Followed by borrowing the Banks novels to roommate in exchange for Alastair Reynold's Revelation Space, Redemption Ark and Chasm City. Stayed at the family cabin by the lake Saimaa for a month or so, and played the first two stories of Odin Sphere and some hours of Valkyrie Profile 2. On returning home, read Frank Herbert's Dune (courtesy of aforementioned roommate), then raided the university library for a change of pace with World Regions in Global Context. I'm currently reading Coastal Sedimentary Environments and The Practice & Science Of Drawing. After that I have queued up Rivers and Floodplains, and the next three books of Dune.

Now if I could find a monitor that works with the PS2, I could go back to wasting my evenings with games instead of boring myself to death on the computer writing blog posts like this...

2009-07-03

Tomtegebra - a small Haskell puzzle game

Tomtegebra, my project for our summer Haskell lab course, is a small puzzle game that masks abstract algebra with cutesy animals. Your goal is to apply transformations to an equation tree to either make both sides equal or, in some levels, to find a binding for the wanted variable.

The levels are Abelian group axioms (no closure though, I don't know how to do that.) The animals are binary operators, the fruits are variables and the rock is a constant.

Implementation follows: the rendering is done with OpenGL, event handling with GLUT, image loading done with Gdk.Pixbuf, textures created with Cairo and text with Pango. There's a shared AppState record IORef that the GLUT callbacks edit and draw. Draw calls, event handling, image loading and all other icky side-effecty stuff happens in the IO monad, while the actual gameplay core is functional.

I'm using vertex buffer objects for drawing and a wholly Haskell-side transform matrix system, mostly for the heck of it. My matrix math lib is probably somewhere between pretty slow and really slow, as it uses lists of lists to represent the matrices (oh the humanity.)

The whole thing weighs in at around 1200 lines of source code, which sounds about OK for a small project like this. The algebra module is the largest individual part, followed by game rendering and the game logic. The drawing helper libraries are scattered in smaller files, in total they'd be about the same size as the algebra bit.

Haskell-land

Overall, Haskell is nice. I like ghc --make and typeclasses. I don't like fromIntegral. GHC doesn't error on partial pattern matches at compile time, which makes it error-prone (at least if you don't use -Wall.) I had some mysterious run-time crash bugs, and they didn't show where the error happened, which was flummoxing. Debugging by randomly changing things did conquer in the end, though.

And Haskell has the usual statically typed functional language good bits.

I used Haddock and Cabal for documentation and compiling, respectively. In the beginning (read: until yesterday) I was using a build script which was more or less ghc --make *.hs, but now it's all Cabalized and everything. Yay!

Tales of woe

There are odd pauses in the game every couple of seconds (old-generation GC run?) Gtk2Hs doesn't have -prof on Debian, so I haven't been able to profile the thing. The game uses 5-10% CPU to draw at 60fps on my Core 2. Which sounds like pretty much for such a trivial drawing load.

The Haskell OpenGL library is not really OpenGL. It's more a new graphics library that calls OpenGL internally, so you get to relearn the names and syntax of everything. And yet it uses Ptrs as its preferred data structure and the GL numeric types, so you can't just pass it lists of Doubles or other easy Haskell data structures. It's a bit worst of both worlds in terms of user friendliness.

GLUT's callback procedures are exactly the wrong way to go for a functional language. They basically necessitate having shared mutable state. A recursive event loop, Xlib/SDL-style, is probably a better way. At least there you wouldn't have to fool around with IORefs.


eventLoop state = do
    ev <- nextEvent
    state' <- handleEvent state ev
    if quit state'
        then return () 
        else eventLoop state'

I used only lists and tuples, because all the other Haskell data structures are weird (more like, I don't know them, so better not use them, otherwise I might be forced to learn something *shock, horror*) And that means that I sometimes get to O(n) the O(1) and O(nm) the O(m). Oh well, maybe I go back and try to optimize things with Data.Map and UArray and friends if I ever get the profiler working.

I didn't like Haddock's parser. Writing code examples was a pain because it complained about them not parsing. And it's using ', ", and ` as special characters, so using the less whiny @code example@ requires backslashing like a demented backswordsman. Haddock only shows the type signature for a function, which means that you get documentation like frustumMatrix :: GLfloat -> GLfloat -> GLfloat -> GLfloat -> GLfloat -> GLfloat -> Matrix4x4, which is not very helpful.

Cabal was a bit of a pain to get going, but not all that bad. The documentation didn't really answer my questions and the examples were kinda useless as they focused on a one-file codebase. You get to poke around blind a good bit. Took me around an hour from mkcabal to a working build with data files found at runtime. The problems I had were: 1) Module name and file name must be the same. 2) If you have user-installed libs, you need to do Setup.hs configure --user to add them to the search path. 3) To get the paths for installed data files, you need to import getDataFilename :: FilePath -> IO FilePath from Paths_mypackage.

2009-06-14

Haskell OpenGL utilities

In order to get back into the gear for writing this Haskell OpenGL application, I'll write here a small library of OpenGL utilities for roughly OpenGL 3.0 -style code. That is, no fixed-function stuff, each drawable is a struct with a shader program, streams (a.k.a. attributes), textures and uniforms.

Maybe something like the following (only works on Haskell OpenGL 2.2.3 and above.)
The code for Models and Shaders compiles but I haven't tested it. The snippets for loading textures and VBOs [see below] are working code.

First some matrix helpers (this is a snippet from a ~100-line matrix math lib):


module Matrix where
import Data.List
import Graphics.Rendering.OpenGL
import Foreign.Ptr

type Matrix4x4 = [Vec4]
type Vec4 = [GLfloat]

glMatrix :: Matrix4x4 -> IO (GLmatrix GLfloat)
glMatrix m = newMatrix ColumnMajor $ flatten m :: IO (GLmatrix GLfloat)

withMatrix4x4 :: Matrix4x4 -> (MatrixOrder -> Ptr GLfloat -> IO a) -> IO a
withMatrix4x4 matrix m = do
    mat <- glMatrix matrix
    withMatrix mat m

flatten :: [[a]] -> [a]
flatten = foldl1 (++)

Then the drawable definition. Drawables are structs with a program, uniforms, streams and samplers. You could think of them as curried GPU function calls, kinda like Drawable = GPU ().


module Models where
import Graphics.Rendering.OpenGL
import VBO
import Texture
import Foreign.Ptr (Ptr, castPtr)
import Matrix
import Data.Int

data Vbo a = Vbo (NumArrayIndices, BufferObject, (IntegerHandling, VertexArrayDescriptor a))
data Stream a = Stream (AttribLocation, Vbo a)
data Sampler = Sampler (UniformLocation, TextureTarget, TextureObject)

data UniformSetting = UniformSetting (UniformLocation, UniformValue)
data UniformValue =
      UniformMatrix4 (Matrix4x4)
    | UniformVertex4 (Vertex4 GLfloat)
    | UniformVertex3 (Vertex3 GLfloat)
    | UniformVertex2 (Vertex2 GLfloat)
    | UniformFloat (TexCoord1 GLfloat)
    | UniformInt (TexCoord1 GLint)

data Drawable = Drawable {
    program    :: Program,
    uniforms   :: [UniformSetting],
    streamMode :: PrimitiveMode,
    streams    :: [Stream GLfloat],
    indices    :: Maybe (Vbo GLushort),
    samplers   :: [Sampler]
    }

uniformSetting :: UniformSetting -> IO ()
uniformSetting (UniformSetting(location, UniformMatrix4 value)) =
    withMatrix4x4 value (\order ptr -> uniformv location 16 (castPtr ptr :: Ptr (TexCoord1 GLfloat)))
uniformSetting (UniformSetting(location, UniformVertex4 value)) = uniform location $= value
uniformSetting (UniformSetting(location, UniformVertex3 value)) = uniform location $= value
uniformSetting (UniformSetting(location, UniformVertex2 value)) = uniform location $= value
uniformSetting (UniformSetting(location, UniformFloat value)) = uniform location $= value
uniformSetting (UniformSetting(location, UniformInt value)) = uniform location $= value

drawDrawable :: Drawable -> IO ()
drawDrawable d = do
    currentProgram $= Just (program d)
    setUniforms (uniforms d)
    setSamplers (samplers d)
    withStreams (streams d) (do
        case indices d of
            Just (Vbo (num, vbo, (_,VertexArrayDescriptor numcomp datatype stride ptr))) -> do
                bindBuffer ElementArrayBuffer $= Just vbo
                drawElements (streamMode d) num datatype ptr
                bindBuffer ElementArrayBuffer $= Nothing
            Nothing -> drawArrays (streamMode d) 0 (minNum (streams d)))
    currentProgram $= Nothing
    where minNum streams = minimum $ map (\(Stream (_,Vbo(n,_,_))) -> n) streams

withStreams :: [Stream a] -> IO () -> IO ()
withStreams streams m = do
    setStreams streams
    m
    disableStreams streams

setStreams :: [Stream a] -> IO ()
setStreams streams = 
    mapM_ (\(Stream (location, Vbo (_, vbo, value))) -> do
            bindBuffer ArrayBuffer $= Just vbo
            vertexAttribArray location $= Enabled
            vertexAttribPointer location $= value) streams

disableStreams :: [Stream a] -> IO ()
disableStreams streams =
    mapM_ (\(Stream (location,_)) -> vertexAttribArray location $= Disabled) streams

setUniforms :: [UniformSetting] -> IO ()
setUniforms uniforms = mapM_ uniformSetting uniforms

setSamplers :: [Sampler] -> IO ()
setSamplers samplers =
    mapM_ (\(i, Sampler (location, texType, tex)) -> do
            activeTexture $= TextureUnit i
            textureBinding texType $= Just tex
            uniform location $= TexCoord1 i) $ zip [0..] samplers

Then we also need loaders for shaders, textures and buffers. Shaders are pretty easy, this loader's copied from the OpenGL binding examples:


module Shaders where
import Control.Monad
import Graphics.Rendering.OpenGL
import Graphics.UI.GLUT

loadProgram :: FilePath -> FilePath -> IO Program
loadProgram vertexShader fragmentShader = 
    loadProgramMulti [vertexShader] [fragmentShader]

loadProgramMulti :: [FilePath] -> [FilePath] -> IO Program
loadProgramMulti vertexShaders fragmentShaders = do
    vs <- mapM readAndCompileShader vertexShaders
    fs <- mapM readAndCompileShader fragmentShaders
    createProgram vs fs

-- Make sure that GLSL is supported by the driver, either directly by the core
-- or via an extension.
checkGLSLSupport :: IO ()
checkGLSLSupport = do
   version <- get (majorMinor glVersion)
   unless (version >= (2,0)) $ do
      extensions <- get glExtensions
      unless ("GL_ARB_shading_language_100" `elem` extensions) $
         ioError (userError "No GLSL support found.")

readAndCompileShader :: Shader s => FilePath -> IO s
readAndCompileShader filePath = do
   src <- readFile filePath
   [shader] <- genObjectNames 1
   shaderSource shader $= [src]
   compileShader shader
   reportErrors
   ok <- get (compileStatus shader)
   infoLog <- get (shaderInfoLog shader)
   mapM_ putStrLn ["Shader info log for '" ++ filePath ++ "':", infoLog, ""]
   unless ok $ do
      deleteObjectNames [shader]
      ioError (userError "shader compilation failed")
   return shader

createProgram :: [VertexShader] -> [FragmentShader] -> IO Program
createProgram vs fs = do
   [program] <- genObjectNames 1
   attachedShaders program $= (vs, fs)
   linkProgram program
   reportErrors
   ok <- get (linkStatus program)
   infoLog <- get (programInfoLog program)
   mapM_ putStrLn ["Program info log:", infoLog, ""]
   unless ok $ do
      deleteObjectNames [program]
      ioError (userError "linking failed")
   return program

Buffers aren't much of a problem either:


module VBO where
import Foreign.Storable
import Data.Array.Storable
import Foreign.Ptr
import Graphics.Rendering.OpenGL
import Graphics.UI.GLUT

createVBO :: Storable a => [a] -> IO BufferObject
createVBO elems = do
    [vbo] <- genObjectNames 1
    bindBuffer ArrayBuffer $= Just vbo
    arr <- newListArray (0, length elems - 1) elems -- Data.Array.MArray
    withStorableArray arr (\ptr ->                  -- Data.Array.Storable
        bufferData ArrayBuffer $= (ptrsize elems, ptr, StaticDraw))
    bindBuffer ArrayBuffer $= Nothing
    reportErrors
    return vbo
    where ptrsize [] = toEnum 0
          ptrsize x:xs = toEnum $ length elems * (sizeOf x)

offset x = plusPtr nullPtr x
-- for use with e.g. VertexArrayDescriptor 3 Float 0 $ offset 0

For textures I'm using Cairo and Gdk.Pixbuf. Turning Cairo surfaces into Ptrs edible by texImage2D is a bit of a bother but eh.


module Texture where
import Data.ByteString (ByteString)
import Data.ByteString.Internal (toForeignPtr)
import Directory (doesFileExist)
import Foreign.ForeignPtr (withForeignPtr)
import Foreign.Ptr
import Graphics.Rendering.OpenGL
import Graphics.Rendering.Cairo hiding (rotate, identityMatrix)
import Graphics.UI.Gtk.Gdk.Pixbuf
import Graphics.UI.Gtk.Cairo

loadTextureFromFile :: FilePath -> IO TextureObject
loadTextureFromFile filepath = do
    assertFile filepath
    createTexture Texture2D Enabled
        (withImageSurfaceFromPixbuf filepath $ texImage2DSurface Nothing 0)

withImageSurfaceFromPixbuf :: FilePath -> (Surface -> IO a) -> IO a
withImageSurfaceFromPixbuf filepath m = do
    pixbuf <- pixbufNewFromFile filepath
    w <- pixbufGetWidth pixbuf
    h <- pixbufGetHeight pixbuf
    withImageSurface FormatARGB32 w h (\s -> do
            renderWith s (do setSourcePixbuf pixbuf 0 0
                             setOperator OperatorSource
                             paint)
            m s)

assertFile :: FilePath -> IO ()
assertFile filepath = do
    fex <- doesFileExist filepath
    if not fex
        then fail (filepath ++ " does not exist")
        else return ()

createTexture :: TextureTarget -> Capability -> IO () -> IO TextureObject
createTexture target mipmap m = do
    [tex] <- genObjectNames 1
    textureBinding target $= Just tex
    texture target $= Enabled
    textureFilter target $= ((Linear', Nothing), Linear')
    textureWrapMode target S $= (Repeated, Clamp)
    textureWrapMode target T $= (Repeated, Clamp)
    generateMipmap target $= mipmap
    m
    if mipmap == Enabled
        then textureFilter target $= ((Linear', Just Linear'), Linear')
        else return ()
    textureBinding target $= Nothing
    return tex

texImage2DSurface :: Maybe CubeMapTarget -> Level -> Surface -> IO ()
texImage2DSurface cubemap level imageSurface = do
    pixelData <- imageSurfaceGetData imageSurface
    (w,h) <- renderWith imageSurface $ do
        w <- imageSurfaceGetWidth imageSurface
        h <- imageSurfaceGetHeight imageSurface
        return (fromIntegral w :: GLsizei, fromIntegral h :: GLsizei)
    texImage2DByteString cubemap level RGBA8 w h BGRA UnsignedByte pixelData

texImage2DByteString :: Maybe CubeMapTarget
                     -> Level
                     -> PixelInternalFormat
                     -> GLsizei
                     -> GLsizei
                     -> PixelFormat
                     -> DataType
                     -> ByteString
                     -> IO ()
texImage2DByteString cubemap level iformat w h format ptype bytestring = do
    let (fptr, foffset, flength) = toForeignPtr bytestring
    if (fromIntegral flength) /= w * h * 4
        then fail "imageSurface dimensions don't match data length"
        else return ()
    withForeignPtr fptr $ \ptr -> do
        let optr = plusPtr ptr foffset
        texImage2D cubemap NoProxy
                    level iformat (TextureSize2D w h) 0
                    (PixelData format ptype optr)

2009-06-13

Sum representation of integer powers

The power xⁿ where x,n ∈ N can be represented as an nth-order sum where the innermost factor is n!.

First, consider x¹: the difference between x¹ and (x+1)¹ is 1. So we have a first-order sum where the innermost factor is 1! (= 1.)


x^1 = sum(i = 0..x, 1)

Now let's look at x². Let's write the sequence of x² from -3 to +3 and mark under each two elements the difference between them:


9  4  1  0  1  4  9
 -5 -3 -1  1  3  5
    2  2  2  2  2

From this we see that we have a second-order sum where the innermost factor is 2! (=2) and the outer factor is -1 (going diagonally left from zero.)


x^2 = -1*sum(i = 0..x, 2*i*sum(j = 0..i, 1))

We can now give a recursive function for the sums that takes a list of factors as its argument (this one only works for positive integers though):


sumMap lst f = sum (map f lst)

sums [] i = 0
sums [x] i = x * i
sums (x:xs) i = x*i + sumMap [0..i] (sums xs)

Then, let's look at the left side of x⁶:


46656   15625   4096   729    64   1  0   1
   -31031   -11529  -3367 -665  -63 -1   1
        19502    8162   2702  602  62  2
            -11340  -5460 -2100 -540 -60
                  5880  3360  1560 480
                    -2520 -1800 -1080
                         720  720

From which we get the factors [-1, 62, -540, 1560, -1800, 720] and


x^6 = -1*sum(a=0..x, 62*sum(b=0..a, -540*sum(c=0..b, 1560*sum(d=0..c, -1800*sum(e=0..d, 720*sum(f=0..e, 1))))))
or
pow6 = sums [-1, 62, -540, 1560, -1800, 720]

In the general case, the first factor is -1^(n-1), the second factor is -2ⁿ + 2 * -1ⁿ, the second to last factor is -(n-1)n! * 2^-1 and the last factor is n!. I don't know about the rest of the factors.

Fun with cubes

Here's the difference grid for the third power:


-27  -8  -1   0   1   8   27
   19   7   1   1   7   19
    -12  -6   0   6   12
        6   6   6   6

Which gives the factors [1, -6, 6].

One result of the sum representation is that each x cubed minus x is divisible by 6: 6 | x³ - x. Or put another way, x³ = x + 6k, k ∈ Z. And the sum between two cubes minus the sum of the bases is also divisible by six: 6 | x³ + y³ - (x + y).

We also see that the difference between any two integer cubes grows as a second order function: n³ - (n-1)³ = 6(n/2)(n+1)+1 = 3n²+3n+1.


cubeDiff n = 3*(n^2 + n) + 1
pow3 n = sumMap [0..n-1] cubeDiff

-- pow3 5
-- 125

There's another amusing difference grid in (x³ - x) / 6:


0   1   4  10  20  35  56  84
  1   3   6  10  15  21  28
    2   3   4   5   6   7
      1   1   1   1   1

2009-06-10

GDP projections

I took some GDP and population figures from Wikipedia and NationMaster and tried to predict the future with them.

First off, the current situation:

Region	2008 nominal GDP/capita	Population in millions
United States	47,103	310
European Union	38,390	500
Japan	38,055	127
Russia	12,487	142
China	3,174	1,345
India	1,078	1,173

Grabbing last decade's average growth rates for each, we can do a linear extrapolation to 2020:

Region	2020 GDP/capita	2020 population
United States	72,900	341
European Union	73,200	661
Japan	46,000	127
Russia	91,000	137
China	13,381	1,431
India	2,990	1,368

These projections do have some issues though:

Russia posted 6x growth over the last decade, continuing that for a second decade sounds unrealistic, because 1998 Russia was suffering from a big financial crisis. A more realistic 3-4x decade growth would put them at $37,500-$50,000 GDP per capita.

160 million person more people in the EU. The EU population growth is mostly driven by enlargement. To grow by 160 million, the EU would need the Balkans (25 million), Turkey (75 million) and Ukraine (46 million.) And the EUR/USD exchange rate is favoring EUR at the moment, while in 2000 the USD was higher. Using an estimate based on 1999 figures, the GDP/capita projection would be around $67000. Though if the EU population growth misses the spot, the GDP/capita will likely be higher as a result, the possible member countries having a GDP/capita a third or fourth of the EU average.

Japan's low growth is from the bubble bursting in 1996, and the following decade of flat growth. Japan's only now reaching mid-90s levels again; the 1994 and 2008 Japan GDP/capita are roughly the same. Assuming that Japan starts posting decade growth rates in the 1.5-1.75x range, they'd be somewhere between the US and EU in 2020.

2040 projections

The linear projections fail even more for 2040. To highlight, consider Russia's GDP/capita. The 2040 projection is 3.7 million dollars. The whole Russian GDP would be 478 trillion dollars, constituting half the global economy. Which is unlikely, to say the least.

My guess for Russia's growth over the next three decades would be something like 4x in 2010-2020, 2x in 2020-2030, 1.5x in 2030-2040. Which'd put Russia at $156,000 GDP/capita. It might seem high today, but consider that the US GDP/capita thirty years ago was $10,000.

At the rate EU and Indian populations were growing in the last decade, EU's population would overtake India in 2080, and they'd have 3.5 and 3.4 billion people, respectively. The 2040 projection for EU would be 1.15 billion citizens, and that'd require all of North Africa and Middle East, plus perhaps parts of Sub-Saharan Africa, and Russia or Pakistan. Which sounds a wee bit far-fetched. Perhaps there will be an union of unions, or some other export mechanism for the EU's rule-of-law-investment-free-trade-combo.

The EU GDP/capita in 2040 would be around $153,000. The US GDP/capita projection for 2040 is $165,000, and Japan might be somewhere around the two. The linear projection for Japan would have them at $78,000, but that doesn't sound too likely.

The Chinese GDP/capita would be at $174,262 at today's high growth rates, which I wouldn't trust to hold for three decades. Today China's GDP/capita compared to the US is at around 1970s South Korea, or late-50s Japan and Spain. If China sticks to Japan's path (4x, 4.5x, 2.7x), they'll be at around US levels in 2040, perhaps followed by a bubble crash. With the South Korean way (4x, 2.3x, 1.6x), the projection is 35% of the US GDP/capita, or $58,000. The Spanish path (2x, 5x, 2x) would take them to half the US GDP/capita.

India would be the most populous country in the world with 1.86 billion, propelling it past China's projected 1.6 billion. India's 3x per decade growth would bring their GDP/capita to around $18,800. The Indian growth might pick up though, three decades of Japan-style growth would bring them to $58,000. Perhaps a good estimate would be somewhere in between: $32,000?

And then the computational power disclaimer: by 2040, you can get a human brain worth of processing power for something like $100 / year, which nicely throws a spanner in these GDP projections. If each person has a dozen virtual humans running on the side, the amount of stuff that one person can get done might well be multiplied by that.

A wee bit of algebra

Am currently in the process of writing a game of algebra in Haskell. The gameplay consists of coaxing group axioms to equality or, in the case of the neutral element and the inverse function, a binding. The base act of equation munging is accomplished through one's big bag of previously proven lemmas and given axioms, which one then applies to matching parts of the equation in the feeble hope of achieving enlightenment.

For example, let us step through the process of proving associativity for a o b = (a x b) x 1, given the abelian group x. The | x in the following denotes applying the rule x at the emboldened position in the equation (why yes, it is an AST internally):


            a o (b o c) = (a o b) o c | a o b = (a x b) x 1
      (a x (b o c)) x 1 = (a o b) o c | a o b = (a x b) x 1
(a x ((b x c) x 1)) x 1 = (a o b) o c | (a x b) x c = a x (b x c)
(a x (b x (c x 1))) x 1 = (a o b) o c | (a x b) x c = a x (b x c)
((a x b) x (c x 1)) x 1 = (a o b) o c | a x b = b x a
((a x b) x (1 x c)) x 1 = (a o b) o c | (a x b) x c = a x (b x c)
(((a x b) x 1) x c) x 1 = (a o b) o c | a o b = (a x b) x 1
      ((a o b) x c) x 1 = (a o b) o c | a o b = (a x b) x 1
            (a o b) o c = (a o b) o c

Oh, have the neutral element as well:


                   a o e(o) = a                         | a o b = a x b x 1
             (a x e(o)) x 1 = a                         | (a = b) = ((a x inv(x,1)) = (b x inv(x,1)))
((a x e(o)) x 1) x inv(x,1) = a x inv(x,1)              | (a x b) x c = a x (b x c)
(a x e(o)) x (1 x inv(x,1)) = a x inv(x,1)              | a x inv(x,a) = e(x)
          (a x e(o)) x e(x) = a x inv(x,1)              | a x e(x) = a
                   a x e(o) = a x inv(x,1)              | (a = b) = ((inv(x,c) x a) = (inv(x,c) x a))
      inv(x,a) x (a x e(o)) = inv(x,a) x (a x inv(x,1)) | (a x b) x c = a x (b x c)
      (inv(x,a) x a) x e(o) = inv(x,a) x (a x inv(x,1)) | inv(x,a) x a = e(x)
                e(x) x e(o) = inv(x,a) x (a x inv(x,1)) | (a x b) x c = a x (b x c)
                e(x) x e(o) = (inv(x,a) x a) x inv(x,1) | inv(x,a) x a = e(x)
                e(x) x e(o) = e(x) x inv(x,1)           | e(x) x a = a
                e(x) x e(o) = inv(x,1)                  | e(x) x a = a
                       e(o) = inv(x,1)

As you might imagine, it is not a very exciting game in its current form. One might imagine replacing the typographical symbols with mushrooms and small furry animals, and the addition of fireworks and showers of colourful jewels on the successful completion of one's proof obligations. Maybe even a fountain of money.

Roam the countryside and befriend local wildlife with your awesome powers of axiomatic rigor! Bring down the castle drawbridge with a well-placed induction! Banish ghosts with the radiant aura of reductio ad absurdum!

Reducing the rest of mathematics to gameplay mechanics remains an open question.

Codewise, what I'm doing looks like this (leaving out all the hairy bits):


type Op = String
data Expr = Expr (Op, Expr, Expr)
          | A | B | C
          | Inv (Op, Expr)
          | Neutral Op
          | Literal Int

type Pattern = Expr
type Substitution = Expr

data Rule = Rule (Pattern, Substitution)

type CheckableRule = ((Rule -> Bool), Rule)


isTrue :: Rule -> Bool
isTrue (Rule (a,b)) = a == b

isBinding :: Expr -> Rule -> Bool
isBinding e (Rule (a,b)) = e == a || e == b

opf :: Op -> Expr -> Expr -> Expr
opf op a b = Expr (op, a, b)


{-
Group axioms main stage:

    Stability: a o b exists in G for all a, b in G
    Magma!

    Associativity: a o (b o c) = (a o b) o c
    Semigroup!

    Neutral element: a o 0 = 0 o a = a
    Monoid!

    Inverse element: a o a_inv = a_inv o a = 0
    Group! Enter Abelian bonus stage!
-}
type CheckableRule = ((Rule -> Bool), Rule)

checkCheckableRule :: CheckableRule -> Bool
checkCheckableRule (p, rule) = p rule

associativity :: Op -> CheckableRule
associativity op = (isTrue, (A `o` (B `o` C)) `eq` ((A `o` B) `o` C))
                  where o = opf op

rightNeutral :: Op -> CheckableRule
rightNeutral op = (isBinding (Neutral op), (A `o` Neutral op) `eq` A)
              where o = opf op

leftNeutral :: Op -> CheckableRule
leftNeutral op = (isBinding (Neutral op), (Neutral op `o` A) `eq` A)
              where o = opf op

rightInverse :: Op -> CheckableRule
rightInverse op = (isBinding (Inv (op, A)), (A `o` Inv (op, A)) `eq` Neutral op)
              where o = opf op

leftInverse :: Op -> CheckableRule
leftInverse op = (isBinding (Inv (op, A)), (Inv (op, A) `o` A) `eq` Neutral op)
              where o = opf op

{-
Abelian bonus stage:

    Commutativity: a o b = b o a
    Abelian group! Enter ring bonus stage!
-}

commutativity :: Op -> CheckableRule
commutativity op = (isTrue, (A `o` B) `eq` (B `o` A))
                  where o = opf op

magma op = [] -- FIXME?
semiGroup op = magma op ++ [associativity op]
monoid op = semiGroup op ++ [rightNeutral op, leftNeutral op]
group op = monoid op ++ [rightInverse op, leftInverse op]
abelianGroup op = group op ++ [commutativity op]

{-
Ring bonus stage:

    Show bonus function to be a semigroup!

    Distributivity of x over o:
    a x (b o c) = (a x b) o (a x c)
    (a o b) x c = (a x c) o (b x c)
    Pseudo-ring!
-}

leftDistributivity :: Op -> Op -> CheckableRule
leftDistributivity opO opX = (isTrue, (A `x` (B `o` C)) `eq` ((A `x` B) `o` (B `x` C)))
                             where o = opf opO
                                   x = opf opX

rightDistributivity :: Op -> Op -> CheckableRule
rightDistributivity opO opX = (isTrue, (A `x` (B `o` C)) `eq` ((A `x` B) `o` (B `x` C)))
                               where o = opf opO
                                     x = opf opX
{-
    Neutral element for x: a x 1 = 1 x a = a
    Ring!

    Commutativity for x: a x b = b x a
    Commutative ring! Enter field bonus stage!

Field bonus stage:

    Inverse element for x in G: a x a_inv = a_inv x a = 1
    Field! Superior! Shower of jewels!
-}

pseudoRing o x = abelianGroup o ++ semiGroup x ++
                 [leftDistributivity o x, rightDistributivity o x]
ring o x = pseudoRing o x ++ [rightNeutral x, leftNeutral x]
commutativeRing o x = ring o x ++ [commutativity x]
field o x = commutativeRing o x ++ [rightInverse x, leftInverse x]

2009-05-28

Dangers of glsl testing

Fglrx + while(true)-shader or some other long loop = hanged computer.

Made the tests work on ATI drivers, but then the aforementioned calamity struck and now I probably get to curse myself for using XFS on /home.

On my Geforce 7600 the long shaders didn't cause an unrecoverable hang but it's not a fully programmable part. Maybe the drivers help too...

Programmable GPUs sure need a multitasking system.

2009-05-24

GLSL parser work-in-progress git repo

To give meself a swift kick in the arse, here the github page.

The grammar is very unfinished and there are probably bad bits involved. The grammar structure comes pretty much directly from the GLSL 1.20 spec, which is nice, as it's LR.

You might notice that it doesn't deal with preprocessor directives, I hope there's a cpp lib out there somewhere. And it's in OCaml so no one can use it for anything.

I'm rather not appreciating the looming mountain of imagined work associated with validating and normalizing GLSL. I mean, syntactical validation isn't going to cut it (though it does help turning "float foo[5] = float[5](0.0, 1.0, 2.0, 3.0, 4.0);" into something that doesn't hit driver bugs.)

GLSL doesn't have numeric literal type coercion, so 20 + 2.2 is a type error. But the Nvidia driver (when you omit the #version pragma) mixes Cg stuff into GLSL, so 20 + 2.2 works there. On ATI it doesn't. On Nvidia with a #version pragma it doesn't work either. So I guess I'll make the pretty-printer add #version 120 at the top of the shader?

And then you have stuff like typecasting vectors, mix() with boolean alpha (IIRC needs #version 130 on ATI, works on Nvidia #version 120), operator overloading and swizzle assignment.

Oh well, it's just work. Do a little bit every day until the day you stop.

Revising initial schedule of "by July 1st" to "by 2012". O, the morning clouds are pretty, 4 am best time of day.

2009-05-16

If you ever wanted to know how a graphics card works

This Radeon R6xx-R7xx programming manual [PDF] is really nifty. (See the rest of the Radeon docs as well.)

2009-05-14

ATI installed for testing

Got an ATI graphics card installed. Took maybe an hour to get it physically installed and dualhead Xinerama going (if Xinerama is off, no extended desktop.) The biggest pain was the ATI download site requiring Javascript, so links didn't cut it.

Canvas 3D worked after applying the ATI compatibility patch. I need to rewrite some of my test shaders to add missing precision qualifiers.

O3D shaders fail to compile as the ATI driver (or something) says that MaxInstructions = -1. And O3D would rather prefer it to be a non-negative number, one that's larger than 23. Or whatever the SM 2.0 minimum is.

It's pretty noisy compared to my passively cooled card (big surprise, eh?) Now the question is what does one do with a teraflop of graphics computing power on Linux...

2009-05-07

Started on the GLSL lexer

Started writing the GLSL parser by sketching the lexer for an hour today. It has all the identifiers and might even lex numbers correctly, but is not finished, does not compile and has no tests.

I'm working from the GLSL 1.40 spec on the assumption that it's easier to remove features than add them (though 1.40 removed the fixed-function bits, so it's not GLSL ES -compatible.)

The next step is to get the lexer to compile, then start writing the grammar, parser and tests.

2009-05-05

Priorities for May

Choices, choices

The Canvas 3D API is getting pretty solid. The next step there would be writing a GLSL parser to validate shaders. I guess it'll run to around 2000 lines of code, so would take two months or so.

Regarding canvas performance issues (API call overhead, GC pauses, slow compositing), what I could do is write a draw list system, which'd be perhaps 500 lines of code (or 20 days) and help in getting rid of API call overhead. The other JS performance fixes would be rewriting the GC (which I would rather not do) and minimizing the JS->C call overhead (which I don't really know anything about, I hear quickstubs would help for the overhead), so I think I won't touch them for now.

Rewriting the compositing, while I believe I could do it, is a whole another can of worms. Basically, it's either going to be hellishly difficult or pretty easy. It would be pretty easy if the surfaces can be uploaded to OpenGL textures and drawn in the window clip area (plus upload timestamp to determine whether to re-upload texture.) But if that's too slow (i.e. if there's a zillion tiny surfaces changing all the time) or needs something special, then it's going to be hard.

The other related projects are writing a proof-of-concept OpenAL binding (guessing 5000 lines) and trying to port the GL canvas to some other browser (maybe a thousand lines?) Or maybe a GL canvas plugin (O3D-style)? The OpenAL experiment would teach some important lessons on the whole advanced audio thing.

On the tests side, a GLSL fuzzer would be nice to have, along with unit tests for the rest of the API. My devious plan to recruit other people to write them for me failed, so I guess I have to write them myself.

Priority lists

The entries in parentheses I'm not going to touch unless struck by divine inspiration.

Compatibility

GLSL parser - by July 1st
1. Spike a quick ocamllex-ocamlyacc implementation to figure out the grammar, GLSL -> AST
2. Write AST printer to output compatible GLSL
3. Port over to some C library (first figure out which..)
GLSL subset spec - by July 1st
1. Write in tandem with the parser
(GLSL compiler)

Tests

GLSL fuzzer - by August 1st
1. Generate random ASTs and print them out
Unit tests for the whole API - by August 1st
1. Start with one large test page, then split it as it progresses

Performance

(GC pauses)
(Slow compositing)
(General API overhead)
Draw list test - by June 1st
1. Capture GL calls from JS, turn into an array of [function_id, args]
2. Write a C++ DrawList-method that loops through the list and calls the wrapped GL functions
3. Benchmark with a complex scene to see if it's worth the trouble

Other

OpenAL test, see if it makes sense - by September 1st
1. Read through the API docs to figure out what it needs (and what it should provide)
2. Write an OpenAL test app
3. Strip out the GL-specific stuff from the Canvas 3D extension
4. Change the API to bind OpenAL instead
5. audioElement.getContext('moz-openal')?
(Canvas 3D plugin)

The whole non-parenthesized part of the above I estimate to take 4-5 months.. see you in September.

Updated 2009-05-06: more detailed plan

2009-04-26

Some O3D vs. GL canvas performance analysis

To continue, my opinion on O3D is that it probably has the best approach for flexible rendering of complex scenes in the browser that I've seen yet. To explain, a little bit of background:

The bottlenecks on the JavaScript side are GC, JS -> C++ API call overhead (timed it at around 1.3 ms / 1000 calls here), and 4x4 matrix multiplication (~20-50x slower than C, and you do it a lot.)

O3D works a bit like editable display lists: you have a transform graph of objects that are separated into draw lists which are turned to native API calls in the renderer. And all that happens on the C++ side, so you don't have any JS -> C++ overhead for the drawing calls.

Suppose you want to animate a thousand meshes and don't want to push the matrix math to your vertex shader. To draw a single mesh, you need to bind the mesh's VBOs and textures, then setup the shader uniforms (transform, samplers, lights) and call the draw command. That means a dozen API calls per mesh, so a thousand meshes would need 12000 API calls, which'd take 15 ms just in call overhead.

And if you multiply the matrices for each mesh in JavaScript, you end up creating at least a thousand matrices worth of garbage every frame (~= 7.7 MB/s), triggering a Firefox GC run every 3 seconds (IIRC the max alloc until GC is 24 megs.) And as a thousand matrix multiplications takes 4 ms here, you end up with a total 19 ms JS overhead per frame and a framerate glitch every three seconds courtesy of GC.

The matrix math overhead is livable, but the JS -> C++ overhead and the GC pauses (on Firefox and Webkit) sink it. On O3D's embedded V8 JS engine, the GC pauses are less of an issue, as it uses a generational collector and the temporary matrices should be taken care of by the fast young generation collections (take this with a dose of salt, I haven't timed it.) They still get some hurt from having to do the matrix math in JS, but it's not too bad compared to the API call overhead and GC pauses.

What to optimize?

The best solution for the API call overhead would be to minimize JS -> C++ call overhead in the JS engine, maybe by generating direct native calls from the JIT.

Making an immediate-mode API that works on the concept of editable draw lists that are executed in C++ would get rid of the API call overhead as well. Even a system to draw a mesh with a single API call would drop the amount of API calls to a tenth (I imagine it'd be something like drawObject({buffers: [], textures: [], program: shader, uniforms: {foo4f: [1,2,3,4], bar1i: 2} }), but the draw list approach is probably easier to implement and more flexible. Record calls into a draw list and run through that in C++.)

GC pauses really need to be fixed in browser JS engines.

The matrix math slowdown isn't too bad but it's still nasty. I've heard some talk of adding native Vector and Matrix types to JS and maybe something like Mono.SIMD.

2009-04-25

O3D on Linux

Building O3D on Linux (svn r36): opt-linux does not build, use dbg-linux instead. The plugin has no input events implemented on Linux. So rendering demos work but interactive ones don't.

Most of the demos have very little JavaScript logic, which I guess is the point. I'll try to make some JavaScript intensive demo and see how that fares. And maybe write some shaders to kick the tires if I can wrap my head around HLSL.

Some more generic gripes:

Having a Direct3D rendering path makes vendors less likely to improve their GL drivers. [Ok, that would be a weaker argument if the given reason behind using D3D wasn't that the GL drivers are worse. Maybe desktop Linux and Mac OS X put enough pressure on the driver vendors to improve their drivers...]

HLSL + Cg is closed-source and bound to x86, so it's pretty much impossible to use in any browser.

2009-04-22

Google's O3D

Executive summary

Google's O3D is a 3D scene graph plugin for Windows and Mac. See Ben Garney's Technical Notes on O3D. Also see the O3D API blog.

I like it.

It doesn't build here though. So I can't test it.

The people behind it are GPU & games industry veterans. Which is good.

It doesn't have anything for playing sounds. Which is par the course but still a bit of a letdown.

I think that the 3D canvas and O3D should use the same shading language, the same viewport origin (D3D uses y-grows-down, OGL y-grows-up) and the same content semantics (loading HTML elements as textures, buffer types.)

Longer ramble

If you want to download the O3D repo, be warned that it has copies of textures as TGAs and PSDs. And 3DS Max files and mp3s... The repo is 1.6 GB in size. And it doesn't build. So I can't test it. All the following is gleaned from the documentation and guessed from the source code.

Anyhow, I like it. It does shaders, imports data from 3D programs and clearly has a clue. The downsides are that it's quite complex and weighs in at around 60 thousand lines of source code (compared to nine thousand for the OpenGL canvas.) But it's a scene graph, so it's no wonder it's a lot bigger.

O3D has a solution to The Shader Problem, namely they use Nvidia's Cg for compiling the shaders and have an ANTLR parser to validate the shaders. The shading language is HLSL SM 2.0 / Cg though, but it probably works the same across hardware / OS / driver combos? I hope so at least.

O3D is a scene graph renderer. Their scene graph consists of transform trees, which contain transform nodes that contain drawable objects, and render trees, which draw transform trees.

Transform nodes are basically utility-wrapped transformation matrices.

Drawables are Shapes, which consist of primitives, each of which has a material to determine the render pass and a bunch of coordinate arrays that interact with the shaders somehow. It's too complicated for me to understand at this time of the day so I'll just paste some "Hello, Cube" example code:


  var viewInfo = o3djs.rendergraph.createBasicView(
      g_pack,
      g_client.root,
      g_client.renderGraphRoot);

  viewInfo.drawContext.projection = g_math.matrix4.perspective(
      g_math.degToRad(30), // 30 degree fov.
      g_client.width / g_client.height,
      1,                  // Near plane.
      5000);              // Far plane.
  viewInfo.drawContext.view = g_math.matrix4.lookAt([0, 1, 5],  // eye
                                            [0, 0, 0],  // target
                                            [0, 1, 0]); // up

  var redEffect = g_pack.createObject('Effect');
  var shaderString = document.getElementById('effect').value;
  redEffect.loadFromFXString(shaderString);

  var redMaterial = g_pack.createObject('Material');
  redMaterial.drawList = viewInfo.performanceDrawList;
  redMaterial.effect = redEffect;

  var cubeShape = g_pack.createObject('Shape');
  var cubePrimitive = g_pack.createObject('Primitive');
  var streamBank = g_pack.createObject('StreamBank');

  cubePrimitive.material = redMaterial;
  cubePrimitive.owner = cubeShape;
  cubePrimitive.streamBank = streamBank;

  cubePrimitive.primitiveType = g_o3d.Primitive.TRIANGLELIST;
  cubePrimitive.numberPrimitives = 12; // 12 triangles
  cubePrimitive.numberVertices = 8;    // 8 vertices in total

  var positionsBuffer = g_pack.createObject('VertexBuffer');
  var positionsField = positionsBuffer.createField('FloatField', 3);
  positionsBuffer.set(g_positionArray); // vertex buffer with cube coords

  var indexBuffer = g_pack.createObject('IndexBuffer');
  indexBuffer.set(g_indicesArray); // indices to vertex buffer
  streamBank.setVertexStream(
      g_o3d.Stream.POSITION, // semantic: This stream stores vertex positions
      0,                     // semantic index: First (and only) position stream
      positionsField,        // field: the field this stream uses.
      0);                    // start_index: How many elements to skip in the
                             //     field.
  cubePrimitive.indexBuffer = indexBuffer;

  g_cubeTransform = g_pack.createObject('Transform');
  g_cubeTransform.addShape(cubeShape);

  g_cubeTransform.parent = g_client.root;

  cubeShape.createDrawElements(g_pack, null);

2009-04-17

Another 3D canvas demo

Adapted from vlad's port of FRequency's 1k demo. I made it a bit lighter and fiddled with the lights and colors.

That demo is pretty funky, my understanding is that it uses the fragment shader to raytrace a sine wave function with two lights. For each pixel, search for the nearest depth where the function is below zero. Then find the tangent of the function at that point and dot it with the light direction to do diffuse shading. Finally add fog color depending on the depth of the point.

Why it uses the tangent instead of the normal is still fuzzy to me.

2009-04-11

GL canvas feedback

Some of the feedback I've heard on using a thin OpenGL ES 2.0 wrapper as the 3D canvas API [and my thoughts parenthesized]:

It would be better to design a new API and translate it to OpenGL / Direct3D / low-level driver calls in the browser backend. [Is it easier to write a new API and translate it to Direct3D and OpenGL than translating OpenGL to Direct3D? Plus, Windows has OpenGL support. Nothing but Windows has Direct3D support. How much work is this going to involve? What will the API be based on? Current hardware? Future hardware? CPU fallback? Shaders? Will it be a pure drawing API, a general purpose data-parallel computing API or something in between? How much complexity does it expose to the web devs? Will it be immediate-mode or a scene graph? How easy will it be to get a teapot on the screen? How easy will it be to get a per-pixel shaded teapot with a normal map, environment map, gloss map, DOF blur, soft shadows and a bloom shader on the screen? Are there any documentation and tutorials for it?]

Java3D would be a better approach than OpenGL. [Might be, want to write a spec and an implementation based on it? What shading language will you use? Remember to write a compiler from it to GLSL and HLSL.]

It's not going to run on computers older than 2 years. [It runs on a 6-year-old computer. Both on Linux and Windows. On an ATI card. I've personally tested it on Linux 32-bit, 64-bit, Windows XP, with Nvidia, ATI and software rendering.]

Having explicit delete methods for resources (textures, FBOs and VBOs) is a bad idea in a GC'd language. [Agreed, they should be freed on GC.]

glGetError is bad, errors should be thrown as JavaScript exceptions. [Agreed. The glGetError overhead is <0.2us per call, it's not going to blow your budget.]

A GLSL shader that works on one driver may fail on another. Even with the same hardware. [Agreed. We need to specify a single compatible subset of the language. So, I guess it's still going to require writing a GLSL parser? Unless there's some #version that's compatible across drivers.]

You can hang the GPU if you have shaders and they have infinite loops in them. [My tests disagree on the infinite loops -part, but it is a problem with heavy shaders (and hypothetical drivers lacking runtime sanity checks.) Getting rid of branches in shaders would probably fix the problem, but is it worth it? And I imagine you could hang the GPU by compositing a few thousand large 2D canvases as well (at least you can make the browser hang and swap a lot.)]

X3D would be better than an immediate-mode API. [X3D isn't an immediate-mode API though, it's a document format. Like SVG. It's taken SVG 8 years from a W3C recommendation to get to the point where no browser still supports the whole spec, and all have different bits that they don't support (filters, animations and fonts being the big culprits.) 2D canvas took around a year to get around (OS X Dashboard 2005, Firefox 2 2006, Opera 9 2006.) Of course X3D might be easier to implement, but I wouldn't count on it.]

Khronos membership and conformance costs are high enough to make it impossible for small operators to have their say on the working group. [Agreed, though none of the browser engine developers are exactly small. Mozilla is probably the smallest at $66M revenue, Opera is $74M, Nokia $66G, Apple $32G, Google $22G and Microsoft $60G. And Nokia, Google and Apple already are members.]

Conclusions? APIs are faster and easier to implement than document formats. Shaders are a pain, but also the best feature in hardware-accelerated drawing. Kicking up a whole new cross-platform accelerated drawing API from dust is going to be hard.

2009-04-10

Visualization of a GC pause

Frame time histogram of 150 4x4 matrix multiplications per frame at 60fps on Firefox trunk 2009-04-10.

Time advances from left to right, each pixel being 16 ms. Each black bar is a frame. The height of a black bar is the time that frame took, 1 ms / pixel.

The tall bars are what's making smooth animations difficult.

2009-04-09

Code generation and dastardly plans

Back from a week and a half of F1 and video games. Managed to port the Canvas 3D tests to the new API (unearthing a bunch of old bugs and a new one.)

I also have a couple Python scripts that generate C++ code from the GLES 2.0 headers + API modifications file + valid arguments grammar. My plan is to make it spew out better smoke tests for the API. And if I can make some Webkit-based browser build on 64-bit Linux (fat chance), adapt the generator to do the easy parts of a proof-of-concept Webkit port.

Extending the code generator to handle state tracking, array arguments and array return values would make it handle most of the API, except for the couple polymorphic functions. The rest of the non-autogened code would be WGL/CGL/GLX/OSMesa-glue, state tracking and validation, totaling some 5000 lines.

My other plans for the near future include: catching up on four weeks of math homework, writing some demos for Canvas 3D (CFD, image filters, editable video filters), writing this year's two-week game (maybe something Disgaea-like? Combos and colors), learning to use Blender, writing a small painting app with Qt.

Seven half-time projects come together to make one 3.5-time project. Math and testgen over the weekend, demos and Blender next week, two-week game on the last two weeks of the month, painting app 2 hours daily alloc. Let's see which project drops first :P

2009-04-01

And now glReadPixels works on ATI pbuffer

glReadPixels from a pbuffer didn't work on a Radeon 9600 with the Linux fglrx drivers. And it was a very strange bug. Apparently pbuffer contexts don't work on ATI fglrx + R300 if you don't first create a X window context and make it current. Here's the code for the fix in nsGLPbufferGLX::Init:


// workaround for Radeon 9600 pbuffers contexts not working without a 
// previous window context
XVisualInfo *visinfo;
int vattrib[] = { GLX_RGBA, None };
visinfo = gGLXWrap.fChooseVisual( sharedDisplay, DefaultScreen(sharedDisplay), vattrib );
workaroundCtx = gGLXWrap.fCreateContext( sharedDisplay, visinfo, NULL, GL_TRUE );
gGLXWrap.fMakeContextCurrent( 
                        sharedDisplay, 
                        DefaultRootWindow(sharedDisplay), 
                        DefaultRootWindow(sharedDisplay), 
                        workaroundCtx );
XFree(visinfo);

Now my Canvas 3D demos work on ATI + Linux, yay!

2009-03-31

More ATI Linux debugging

Been debugging the Canvas 3D Linux ATI fglrx crashes when using several contexts and pbuffers. The ATI driver doesn't like glXMakeContextCurrent with different Display*s, and if you have a non-zero pbuffer context bound when you close the Display*, it will crash. Also, swapping between pbuffers in a single context crashes.

By using a shared Display* for all contexts, doing glXMakeContextCurrent(dpy, 0, 0, 0) before destroying the context, and recreating the context on resize (instead of just swapping the pbuffer) got me to the point that it's not crashing on my box. It's still not working right though, as glReadPixels returns random framebuffer data.

Hopefully I'll have it working in the next couple days. Then the only remaining Linux drivers to figure out would be the open source drivers, Intel and the software fallback.

I did test ATI on Windows XP too. Works great after installing the latest drivers. Tested on a Radeon 9600 from 2003 (on a Sempron 2400+) and an X1950 from 2007 (on a Core2.)

The Radeon 9600 really disliked my DOF blur shader, dropping to single-digit FPS when it was enabled. But sans the blur the "25 spinning per-pixel lit cubes"-demo ran smoothly. The X1950 plowed through everything without a hitch. I need heavier demos.