fhtr: March 2009

2009-03-31

More ATI Linux debugging

Been debugging the Canvas 3D Linux ATI fglrx crashes when using several contexts and pbuffers. The ATI driver doesn't like glXMakeContextCurrent with different Display*s, and if you have a non-zero pbuffer context bound when you close the Display*, it will crash. Also, swapping between pbuffers in a single context crashes.

By using a shared Display* for all contexts, doing glXMakeContextCurrent(dpy, 0, 0, 0) before destroying the context, and recreating the context on resize (instead of just swapping the pbuffer) got me to the point that it's not crashing on my box. It's still not working right though, as glReadPixels returns random framebuffer data.

Hopefully I'll have it working in the next couple days. Then the only remaining Linux drivers to figure out would be the open source drivers, Intel and the software fallback.

I did test ATI on Windows XP too. Works great after installing the latest drivers. Tested on a Radeon 9600 from 2003 (on a Sempron 2400+) and an X1950 from 2007 (on a Core2.)

The Radeon 9600 really disliked my DOF blur shader, dropping to single-digit FPS when it was enabled. But sans the blur the "25 spinning per-pixel lit cubes"-demo ran smoothly. The X1950 plowed through everything without a hitch. I need heavier demos.

2009-03-26

Why you might like Canvas 3D (or not)

Apparently we now have a highly advanced 3D virtual worlds platform that solves all of world's woes. "But what's the difference between that and VRML?", I hear you ask. Several of you, in fact.

Well, let me shine some light on that by telling you what Canvas 3D is.

Canvas 3D is OpenGL ES 2.0

Canvas 3D is an OpenGL ES 2.0 binding for JavaScript.

No, really. That's it. End of story. (What? You want more?)

OpenGL ES 2.0 is a hardware-accelerated drawing library

OpenGL ES 2.0 is a way of telling the graphics card to draw something. You give it a bunch of coordinates and images and ask it to process them with your shader. Shaders are small image processing programs that run on the graphics processor. At around a thousand times the speed of JavaScript.

Sure, you can do 3D with shaders. You can also do real-time video filters with shaders. Not to mention wobbly windows and fancy slideshows. All things for which plain JavaScript is just way too slow. And if you're really ambitious, you can even do computational biology with shaders.

The 1000x speed difference mentioned above? That's the difference between doing something thirty times per second vs. doing it once every thirty seconds. The difference between unbearably slow and smoothly animated.

And OpenGL ES 2.0 is an open standard that is relatively well documented, and is actually used by numerous developers the world over (shocking, I know.)

For example, Google's Chrome browser uses OpenGL internally to do its drawing. Mac OS X uses OpenGL to do some of its fancy desktop effects. Adobe Photoshop uses OpenGL to accelerate its image filters. Adobe Acrobat Reader uses OpenGL. Most multiplatform games use OpenGL. Your fancy new cellphone likely uses OpenGL for smoothly zooming and panning between your photos.

OpenGL is not VRML

The difference between OpenGL and VRML: You give both your geometry and textures. OpenGL draws what you tell it to draw, when you tell it to draw. VRML draws what it wants to draw, when it wants to draw.

It's like the difference between an oven and a breadmaker. Sure, you can bake bread with both, and it's generally going to be easier with the breadmaker. But perhaps, just perhaps, you might occasionally want to bake something other than bread.

And it's a heck of a lot harder to build a breadmaker than an oven.

Rain on the parade

The OpenGL canvas is not a panacea though. It's OpenGL ES 2.0 for JavaScript, both in good and bad.

First off, OpenGL ES 2.0 is a very low-level API. You need something like 500 lines of code to get a lit spinning cube going. That includes writing your own matrix math library and a lighting shader. So if you just want to put a dancing teapot on your web page, wait for a convenience library to ease your pains.

Second, the slowness of JavaScript means that many of the things that you might wish to do with the OpenGL canvas are not going to run too well. Especially complex animations or anything even remotely resembling an action game are going to require a fairly extensive retooling of the JavaScript runtime, as it's currently causing a lot of framerate glitches. I hear there's work afoot towards fixing that though, but I'll believe it when I see it.

And third, you still need something else to take care of advanced audio (as I kinda doubt the audio-tag is cut out for that.)

Oh, the future is perfect and diamondoid!

But, knowing the obstacles means knowing your goals. Making the JS garbage collector less laggy and writing an OpenAL binding might actually make writing quite impressive action games on the browser a real possibility.

It's still early days for Canvas 3D though, let's see how it unfolds.

2009-03-21

More Canvas 3D tests, preliminary bandwidth benchmark

Back to the test-writing circuit, today brings tests for texSubImage2D, texSubImage2DHTML, copyTexImage2D and copyTexSubImage2D.

Also, wrote a small bandwidth micro-benchmark to measure texture upload bandwidth and vertex upload bandwidth. And some "drawing" tests that don't draw anything. Which makes them ever-so-slightly useless. I wrote C versions of some of the benchmarks too, to have a baseline to compare to.

Here are the preliminary results (you should take these with a huge dose of salt, it's more than likely that I'm Just Doing It Wrong):

texImage2D bandwidth: C 1.2 GB/s, JS 200 MB/s, JS with floats in the array 20 MB/s

texSubImage2D bandwidth: C 1.2 GB/s, JS 200 MB/s

texImage2DHTML bandwidth: JS 85 MB/s (I kinda hobbled that by removing the single optimization it had.)

readPixels bandwidth: JS 18 MB/s

getImageData bandwidth: JS 22 MB/s

bufferData bandwidth: C 3.5 GB/s, JS 211 MB/s

The vertex "drawing" numbers I got were around 65-100 million vertices per second. The anomalously slow one was drawing in JavaScript with vertex arrays and swapping the array after each draw, as it had to convert the JS array into a C array. And that was taking around 1ms for every fifty thousand floats. So in JS, you really should use VBOs for performance.

2009-03-19

Linux + ATI + Canvas 3D = GLX 1.3, please?

fgl_glxgears (using pbuffers) works. The Canvas 3D extension doesn't. Both use the same calls to create the pbuffer and context, and make the context current. Plus I'm seeing stack corruption on the extension. Something is broken, eh?

Been remote-debugging Canvas 3D crashing on the ATI Linux drivers (fglrx) for the past two days. And the short and sweet of it is: no go.

To explain: Canvas 3D uses pbuffers. Pbuffers are a part of GLX 1.3. Mesa only has GLX 1.2. ATI drivers use Mesa.

Is there any other way of doing off-screen rendering for the default framebuffer than pbuffers? All we really need is a way to change GL contexts and do glReadPixels from the current context. The contexts can't share data.

Here's a simple pbuffer test, for what it's worth. It's pretty much what Canvas 3D does to juggle and manage the different GL contexts.

2009-03-17

GLSL parsing

Yesterday's GLSL hole, I think I have a potential fix for it. Wrote a small parser that greps the shader source for problematic GLSL keywords and marks the shader as secure/insecure based on that. And implemented a prototype of the security model, so that a write-only canvas only allows secure shaders to run on it.

The basic axiom of the security model is "If the canvas is write-only, the bound program is secure."

And the rest of the axioms are:

A new shader is secure
A new program is secure
A shader is insecure iff its source has problematic keywords
A program is insecure iff an insecure shader is attached to it

So the statements we need to hook to are: "the canvas is write-only", "the bound program is secure", "an insecure shader is attached", "source has problematic keywords", "new shader" and "new program".

"New shader" and "new program" are handled by making CreateShader and CreateProgram set the created object as secure.

"Source has problematic keywords" is handled by ShaderSource setting the secure status of the given shader.

"An insecure shader is attached" is handled by LinkProgram, setting the program insecure if any of the attached shaders is insecure (programSec &= shaderSec.) If the program is the bound program and it becomes insecure, bind the null program.

"The canvas is write-only" is handled by TexImage2DHTML and TexSubImage2DHTML. After the call, if the canvas is write-only and the bound program is insecure, bind the null program.

"The bound program is secure" needs to be asserted in the above functions (sans CreateProgram/Shader), and in UseProgram. If the program that we're trying to bind is insecure, throw a security error.

Update: AttachShader -> LinkProgram, attach just adds the shader object to the linked objects (so you can edit it after attaching and before linking. Edits after linking don't affect the program, however.)

2009-03-16

Bad juju Sunday

Had a crash in a demo. And found the culprit. It was from having a non-zero FBO bound when doing glXDestroyContext. Which was very WTF.

Allowing images and canvases with content from outside the current domain as textures is a bad idea, as you can read them with some GLSL trickery, even if you can't use getImageData/readPixels on the context. So that needs to be fixed. Which'll make the context quite a bit less useful, but it's either that or moving to a fixed-function pipeline.

One solution would be locking the context to use the GL fixed-function pipeline once you upload a texture outside the SOP. Or have a set of trusted shader programs that you could choose from. Or force a safe subset of the shading language. (Do I smell scope creep?)

And I finally managed to write a shader that takes a long time to run and hangs the GL driver for its duration. I'm just going to call that a driver problem. If there was some easy way to compute the maximum running time of a shader and reject ones that go over the threshold of acceptability, that could be used to reject potentially nasty shaders.

But I'm still going to call that a driver problem.

2009-03-15

JS xpconnect call overhead

Wanted to know how much faster it would be to do the 4x4 matrix math in a C++ library called from JavaScript, so benchmarked an empty method vs. doing JS mmul4x4. The empty method looks like NS_IMETHODIMP nsMyClass::DoNothing() { return NS_OK; }.

JS mmul4x4 a million times took 1.5s with JIT. Calling the empty method a million times took 1.2s. Doing a million mmul4x4s in C took 0.06s.

Which is actually pretty nasty, as it means that a thousand GL calls will have 1.2 milliseconds overhead. And you usually need to do several GL calls per drawn object.

So, assuming an average of 3 GL calls and one mmul4x4 per object, the JavaScript overhead would be around five milliseconds for a thousand objects. Add in the 10 ms it takes for the swapBuffers compositing and hey, doing a thousand objects at 60 fps just became impossible :(

Update 2: gl.isBuffer empty call overhead is 1.4 us instead of the 1.2 us.

Update: Measured the overhead for actual GL calls and it's a bit more grim. The full method call for, say, gl.isBuffer takes 3.6 us. Minus the 1.4 us overhead and we're left with 2.2 us of actual work. Of which getting the NativeJSContext takes 0.2 us, glXMakeContextCurrent 1.6 us, glIsBuffer 0.2 us and the trailing glGetError 0.2 us. Ouch.

The reason why the GL canvas context does glXMakeContextCurrent before each GL call is that Firefox is single-threaded. And you can only have one current GL context per thread. So you need to manually flip between the different GL rendering contexts. Which appears to be costly.

But there's a solution! if (glXGetCurrentContext() != myContext) glXMakeContextCurrent(dpy, pbuf, pbuf, myContext); Now gl.isBuffer takes only 2.0 us if there's no contention.

It's still sad that there's a 1.6 us overhead for a 0.4 us call, but at least it's better than a 3.2 us overhead :>

2009-03-14

My fork of the Canvas 3D Mercurial repo

If you want to follow what I'm doing, here's my fork of the Canvas 3D extension repo. Review welcome, but that's too much to ask.

You can compile it by copying canvas3d/ to $my_firefox_source_dir/extensions/canvas3d and adding ac_add_options --enable-extensions=default,canvas3d to your .mozconfig.

The tests are at the usual location. If you do actually get the extension built and running and the tests going, please drop me a note. I've only tested on 64-bit Linux using the proprietary Nvidia drivers, so feedback on the Windows-OS X-Linux 32-bit-ATI-Intel -axis would be most helpful.

Fuzz tests, texture methods, GLSL ES pondering

Wrote a fuzzer that calls each method of the GL context with randomly generated argument. And found a couple segfaults from me doing stupid things. IIRC they were all related to array indexing (big surprise there, eh?)

I think I will move the fuzzer to unit.js and add in some QuickCheck functions, so that it can be used in a directed fashion, instead of just doing a million calls with random arguments (the current fuzzTheAPI.html does around 600k calls against the API.)

I actually had copyTex[Sub]Image2D implemented from a while ago, but didn't have tests for them. So today I wrote texSubImage2D and texSubImage2DHTML, and made texSubImage2D and texImage2D take a bunch of extra GL types (glTexSubImage2D man page had a longer list of types, including GL_FLOAT, GL_INT, etc., so I made the tex methods not error on those.) It's entirely possible that using a GL_FLOAT texture on an actual GL ES 2.0 implementation will throw an error.

Speaking of GL ES 2.0, the GL context API (GL Web 2.0 or whatever it is called) is quite compatible with it, but GLSL ES is a whole different beast. It's like GLSL 1.20 with precision qualifiers, and without most built-in variables. A half-way house between GLSL 1.20 and 1.30, if you will. I don't know if you can use GLSL ES on desktop graphics card drivers.

VertexAttrib[1234]fv are in, as are UniformMatrix[234]fv. No explicit tests yet for them or the new texture methods. At least the fuzzer doesn't segfault them, so maybe they work! We'll see! Tests tomorrow.

Canvas 3D uniforms, glGetError exception throwing

Ok, implemented uniform[1234][fi]v? with the power of preprocessor macros. And search-and-replaced if (LogGLError()) return NS_ERROR_INVALID_ARG; before each and every return NS_OK; (and went through the code to deleted false positives.)

I don't know if I want to delete uniformi and uniformf. Might as well, but that'd break what little existing code there is.

Tomorrow's plan is to write texSubImage2D[HTML] and copyTex[Sub]Image2D plus tests for them and the new uniforms. And uniformMatrix[234]fv. And, uh, vertexAttrib[1234]fv? (though I don't have problems with the existing magical version, copy-paste compatability might be nice to have.)

Tried to add video element support to texImage2DHTML but nsHTMLVideoElement.h and nsHTMLMediaElement.h are not usable as extension headers (I don't know why they're in dist/include/ in the first place if it's impossible to use them.) So if you want to use video as a texture, you need to draw it onto a 2D canvas first and use the canvas as a texture. And cry as you bleed CPU time.

Forgot the list of GLSL tests from the previous post:


GLSL
  + OOB access to uniform
  + OOB access to const array
  + OOB access to attrib
  + infinite loops
  + unused attribs
  + unused uniforms

2009-03-13

Canvas 3D high-priority tests done

Got through my list of high-priority tests. Next up is implementing the missing high-priority methods, writing tests for those, writing tests for getters, and adding int err = glGetError(); if (err != GL_NO_ERROR) return NS_ERROR_GL_JUST_BLEW_UP; to every method's tail.

I don't like gl.uniformf, gl.uniformi and gl.uniformMatrix. Thinking of replacing them with the typed versions. Why? Because uniformf et al have opaque semantics. You need to read the source to figure out how to use them correctly (and even then it's difficult to understand.) Plus renaming oft-used functions makes porting harder.

And, well, the less new code there is, the smaller the maintenance burden. And the less fancy logic there is, the fewer (possibly erroneous) assumptions you need to make. For example, should gl.uniformf(blur_kernel_7, [1,2,3,4,5,6,7]) work? How about gl.uniformf(vec4, [1,2,3,4,5,6,7])?

Using gl.uniform1fv(blur_kernel_7, [1,2,3,4,5,6,7]) and gl.uniform4fv(vec4, [1,2,3,4,5,6,7]) makes it clear what you're trying to do. And hopefully easier to catch the errors and give meaningful error messages.

List of tests I've got now, this should be all the nasty functions (sans the ones with no implementations, i.e. texSubImage2D[HTML] and copyTex[Sub]Image2D.)


== Methods that have tests ==

bufferData
  + with array
  + with bad args

bufferSubData
  + with array
  + with bad args

drawArrays
  + with vertex arrays
  + with buffers
  + with bad args

drawElements
  + with vertex arrays
  + with buffers
  + with bad args (oob count & offset, oob indices)

getImageData
  + normal
  + with GL canvas that has a non-SOP texture uploaded
  + bad args (coords outside viewport, zero size)

readPixels
  + normal
  + with GL canvas that has a non-SOP texture uploaded
  + bad args (coords outside viewport, zero size, bad type)

texImage2D
  + with NULL
  + with array of ints
  + with bad args (bad dims, bad type, bad border)

texImage2DHTML
  + with canvas
  + with img
  + with bad args (non-image element)
  + with non-SOP img
  + with non-SOP canvas (i.e. canvas with non-SOP img)

uniformf
  + with vec
  + with scalar
  + with OOB args
  + with bad uniform id

uniformi
  + with vec
  + with scalar
  + with OOB args
  + with bad uniform id

uniformMatrix
  + with 2x2, 3x3, 4x4 matrices
  + with OOB args
  + with bad uniform id

vertexAttrib
  + normal
  + with oob args
  + with bad attrib id

vertexAttribPointer
  + with array
  + with buffer offset
  + with bad offset
  + with bad attrib id

2009-03-11

Still alive

No, I didn't manage to crash or hang my GPU with the GLSL tests. Which may speak more for my test-writing ability, but there you have it.

Though I did manage to crash the Canvas 3D extension by feeding it a negative vertex attribute id as it was using it to do some cache array indexing (int id, only checked for id >= arrayLength.) Got a hacky copy-pasted-from-Context2D.cpp SOP checking going and passing tests for it. Trying to #include "nsHTMLVideoElement.cpp" duly broke the build, so no video element texImage2DHTML.

Remaining things in my shortlist of tests to write are: vertexAttribPointer, vertexAttrib, uniform[fi], bufferData, bufferSubData. Then it's time for secondary tests (methods that have custom implementations and especially getters and things that take indices.)

My testing priority heuristic goes something like this: Deals with web data => SOP bugs. Array indexing => segfaults. Segfaults => you're pwned, dood. State => bugs. Allocation => memleaks. New code => new bugs. Complex logic => complex bugs. No code => maybe no bugs, who knows.

So code I like pushes everything through well-trodden paths with the least amount of new code and in the simplest way possible. Preferably without array indexing, heap allocation, or state of any kind.

2009-03-10

4x4 matrix multiplication in JavaScript vs C

The benchmark: do a million 4x4 double matrix multiplications.

Update: Here's the C version and the ASM output. And here's JavaScript version.

Update #2: I wrote an SSE2 version of the 4x4 float matrix multiplication, check it out. It's roughly four times faster than the scalar version.

SpiderMonkey (as x86_64 doesn't get any TraceMonkey love)

7 seconds.

SquirrelFish (extrapolated from someone else's numbers on a faster computer)

1.6 seconds.

C (gcc -O3) with alloca'd double arrays (yes, this is a bad idea)

2.6 seconds.

C (gcc -O3) with malloc'd double arrays

0.100 seconds.

C (gcc -O3) with posix_memalign'd double arrays (get them on a 16-byte boundary)

0.082 seconds.

C using floats (and malloc'd arrays, posix_memalign made the times the same as with doubles... go figure.)

0.052 seconds.

What did I learn

SSE (which GCC outputs for fp math) really doesn't like unaligned data. Or stack data. Or both.
And that fast JavaScript math is 20x slower than normal C.
And that data allocation is a finicky beast.
The speed difference between malloc'd and posix_memalign'd data is just wack.

Edit: On reading the ASM generated for the float arrays: GCC inlines the multiplication using movaps (i.e. 128-bit wide float mov) when you use only malloc, but if you have a posix_memalign call on an array (you don't even need to use the result), it falls back to using movss (scalar float mov.) The throughput for both is the same, but movss moves 4 times less data.

Looking at the ASM generated for the double arrays, it's the same thing but opposite results. The malloc version uses movapd and movhpd (and some mulpd and addpd), while the posix_memalign version uses movsd and does scalar math. But the malloc version is jumpier and longer, so that probably screws its performance.

If you do an average of one mmult per active object per frame, and the rest of your engine overhead is a fixed 12 ms per 60fps frame, you could have 50000 objects with C, 1700 objects with SquirrelFish, and 370 objects with SpiderMonkey. And then you still would have to try and do something about the 150 ms GC pauses that SpiderMonkey hoists on you...

2009-03-08

Canvas 3D tests update

On the canvas3d-tests front, added VBO tests for drawArrays (and fixed some drawArrays tests), wrote some GLSL compile tests and tomorrow I get to see if they hang or crash my video card.

Also wrote bounds-checking for drawArrays and vertexAttribPointer with buffer index param. And vlad posted his updates, so I get to try and integrate them into my tree. His drawArrays/drawElements bounds-checking is more elegant than mine, so that's good. And I see that he's now using CGL instead of AGL on Apple, plus blending the drawing area using OpenGL on CGL, which should be nice and fast.

Benchmarked the GLX swapBuffers implementation that does glReadPixels+premultiply. Using a 1000x1000 canvas, swapBuffers takes 8 ms, of which 5.5 ms is glReadPixels and 2.5 ms premultiply (using SSE2, which is ~2x faster than the normal version.) And that doesn't include the time it takes to blend the canvas to the document, which likely adds at least 2 ms on top of the 8 ms. So the overhead for the glReadPixels+premultiply -approach is around 10 ms per frame. Which makes 60fps a bit challenging.

Doing the premultiply and blend using OpenGL (by maintaining a cached texture copy of the area below) could bring the overhead down to the 5.5 ms used by glReadPixels. Or if you can do that, you could also maintain a cached texture of the area above as well and use the GL context for doing the compositing and rendering for the canvas area, and get rid of the glReadPixels call altogether (though it does sound like something that would make you lose all your hair.)

2009-03-05

The anatomy of the Canvas 3D extension

Been fiddling around with Firefox's Canvas 3D extension for the last week. Canvas 3D adds an OpenGL ES 2.0 context to the HTML5 Canvas element, giving you access to a few hundred GFLOPS of graphics computing power.

I've been working on adding framebuffer objects, glReadPixels, getImageData, toDataURL and a test suite to the extension. And it's a bit hostile to one's sanity - as OpenGL isn't very good at reporting errors - but what can you do?

It's been educational though. Here's a small overview of the way the extension works:

Organization of the extension code

The code for the extension is split into five major bits, outlined below.

C++ wrapper around OpenGL

src/glwrap.h
src/glwrap.cpp

Implement the GLES20Wrap-class, which wraps the OpenGL shared library by loading the OpenGL ES 2.0 symbols from the shared object (e.g. /usr/lib/libGL.so) in much the same way as GLEW.

Platform-specific GLPbuffer implementations

src/nsGLPbuffer.h
src/nsGLPbufferGLX.cpp
src/nsGLPbufferAGL.cpp
src/nsGLPbufferWGL.cpp
src/nsGLPbufferOSMesa.cpp

These set up the rendering context for the canvas, deal with resizing it, and implement a SwapBuffers() that uses glReadPixels() to read the current framebuffer contents into the Thebes surface for the nsGLPbuffer.

The Thebes surface is then used for drawing the canvas element on the page, and also provides image data for getImageData and toDataURL (Thebes is the Firefox rendering engine, essentially a Cairo backend wrapper with heavily extended text capabilities.)

Platform-independent plumbing for dealing with the nsGLPbuffer

src/nsCanvasRenderingContextGL.h
src/nsCanvasRenderingContextGL.cpp

The class nsCanvasRenderingContextGLPrivate (I'll call it "ContextGL" from here on) stands between the browser and the OpenGL wrappers described above. ContextGL implements the <canvas> element side of the GL canvas.

When you create a new GL canvas context, ContextGL creates a nsGLPbuffer and binds it to the canvas context in the SetCanvasElement-method.

When you resize the canvas, ContextGL calls the nsGLPbuffer's Resize-method.

When the browser redraws the document, it calls ContextGL's Render-method to draw the GL framebuffer (the Thebes surface mentioned above) onto the browser window.

The DoSwapBuffers-method is called by gl.swapBuffers() and prompts a redraw of the document (by invalidating the canvas element.)

And the GetInputStream-method is used by canvas.toDataURL() to encode the canvas contents into e.g. a PNG image.

C++ implementation of the JavaScript OpenGL context interface

src/nsCanvasRenderingContextGLWeb20.cpp

If ContextGL above was the implementation of the canvas element, ContextGLWeb20 is the implementation of the moz-glweb20 drawing context. It wraps the C++ OpenGL wrapper into a JavaScript library, defined in ContextGLWeb20.idl below.

Most of ContextGLWeb20 is pretty straightforward translation (in fact, a large part is defined by one-liner macros such as GL_SAME_METHOD_1(UseProgram, UseProgram, PRUint32)), but anything that deals with arrays, pointers and indices (genTextures etc. gen*, buffers, textures, vertexAttribPointer, uniform*, readPixels, getImageData) needs to cast values between JS and C++, and do bounds-checking (or should, at least.)

There are also a few methods that implement a higher-level interface over the basic OpenGL functions, e.g. gl.uniformf(some_uniform, [1.0, 2.0, 3.0, 4.0]) is turned internally into glUniform4fv(some_uniform, 1, arr).

In terms of API additions, the only truly new method is gl.texImage2DHTML(tex_id, image_or_canvas_element) for using HTML images and canvases as textures.

JavaScript interface definitions

src/nsCanvas3DModule.cpp - the extension module setup
public/nsICanvasRenderingContextGL.idl - GL constants
public/nsICanvasRenderingContextGLWeb20.idl - GL functions

The IDL files work sort of like header files shared between JavaScript and C++, basically saying "Hey, these are the JavaScript methods of the GL context, you better have an implementation for them in your C++ class!"

For example, if you have void useProgram (in PRUint32 program); in the IDL, you need NS_IMETHODIMP nsCanvasRenderingContextGLWeb20::UseProgram(PRUint32 program) {...} in the cpp.

Some performance numbers

The Canvas 3D is a bit of an odd beast performance-wise, as it's hobbled by Cairo on one side and JavaScript on the other.

E.g. on my computer, doing a 30 fps animation of a 400x400 canvas uses something like half of a single core. The CPU time breakdown is ~10% for JS matrix math, another 10% for premultiplying the pixels in SwapBuffers, 30% for GL calls, and 50% for Cairo drawing the GL framebuffer on the HTML document.

In case you're interested, the animation draws a spinning per-pixel lit cube with a depth blur done using 6 gaussian blur passes. And a premultiply-unpremultiply-pass to make alpha work ok with blur. (OGG video)

And JavaScript. Well. I did a small benchmark, with a 7x7 gaussian blur kernel over a 256x256 Firefox logo (decomposed into a horizontal blur and a vertical blur.) JavaScript took 0.8 seconds to do a single blur. With GLSL, it took 0.4 seconds to do a thousand blurs.

Yes, that's two thousand times faster. And this on a 3-year-old Geforce 7600 GS that I bought because it was cheap, had two DVI outs and passive cooling.

So, if you want good performance, push as much of your number crunching to the shaders as you can, and rewrite Firefox's graphics engine to use OpenGL for compositing.