fhtr: April 2009

2009-04-26

Some O3D vs. GL canvas performance analysis

To continue, my opinion on O3D is that it probably has the best approach for flexible rendering of complex scenes in the browser that I've seen yet. To explain, a little bit of background:

The bottlenecks on the JavaScript side are GC, JS -> C++ API call overhead (timed it at around 1.3 ms / 1000 calls here), and 4x4 matrix multiplication (~20-50x slower than C, and you do it a lot.)

O3D works a bit like editable display lists: you have a transform graph of objects that are separated into draw lists which are turned to native API calls in the renderer. And all that happens on the C++ side, so you don't have any JS -> C++ overhead for the drawing calls.

Suppose you want to animate a thousand meshes and don't want to push the matrix math to your vertex shader. To draw a single mesh, you need to bind the mesh's VBOs and textures, then setup the shader uniforms (transform, samplers, lights) and call the draw command. That means a dozen API calls per mesh, so a thousand meshes would need 12000 API calls, which'd take 15 ms just in call overhead.

And if you multiply the matrices for each mesh in JavaScript, you end up creating at least a thousand matrices worth of garbage every frame (~= 7.7 MB/s), triggering a Firefox GC run every 3 seconds (IIRC the max alloc until GC is 24 megs.) And as a thousand matrix multiplications takes 4 ms here, you end up with a total 19 ms JS overhead per frame and a framerate glitch every three seconds courtesy of GC.

The matrix math overhead is livable, but the JS -> C++ overhead and the GC pauses (on Firefox and Webkit) sink it. On O3D's embedded V8 JS engine, the GC pauses are less of an issue, as it uses a generational collector and the temporary matrices should be taken care of by the fast young generation collections (take this with a dose of salt, I haven't timed it.) They still get some hurt from having to do the matrix math in JS, but it's not too bad compared to the API call overhead and GC pauses.

What to optimize?

The best solution for the API call overhead would be to minimize JS -> C++ call overhead in the JS engine, maybe by generating direct native calls from the JIT.

Making an immediate-mode API that works on the concept of editable draw lists that are executed in C++ would get rid of the API call overhead as well. Even a system to draw a mesh with a single API call would drop the amount of API calls to a tenth (I imagine it'd be something like drawObject({buffers: [], textures: [], program: shader, uniforms: {foo4f: [1,2,3,4], bar1i: 2} }), but the draw list approach is probably easier to implement and more flexible. Record calls into a draw list and run through that in C++.)

GC pauses really need to be fixed in browser JS engines.

The matrix math slowdown isn't too bad but it's still nasty. I've heard some talk of adding native Vector and Matrix types to JS and maybe something like Mono.SIMD.

2009-04-25

O3D on Linux

Building O3D on Linux (svn r36): opt-linux does not build, use dbg-linux instead. The plugin has no input events implemented on Linux. So rendering demos work but interactive ones don't.

Most of the demos have very little JavaScript logic, which I guess is the point. I'll try to make some JavaScript intensive demo and see how that fares. And maybe write some shaders to kick the tires if I can wrap my head around HLSL.

Some more generic gripes:

Having a Direct3D rendering path makes vendors less likely to improve their GL drivers. [Ok, that would be a weaker argument if the given reason behind using D3D wasn't that the GL drivers are worse. Maybe desktop Linux and Mac OS X put enough pressure on the driver vendors to improve their drivers...]

HLSL + Cg is closed-source and bound to x86, so it's pretty much impossible to use in any browser.

2009-04-22

Google's O3D

Executive summary

Google's O3D is a 3D scene graph plugin for Windows and Mac. See Ben Garney's Technical Notes on O3D. Also see the O3D API blog.

I like it.

It doesn't build here though. So I can't test it.

The people behind it are GPU & games industry veterans. Which is good.

It doesn't have anything for playing sounds. Which is par the course but still a bit of a letdown.

I think that the 3D canvas and O3D should use the same shading language, the same viewport origin (D3D uses y-grows-down, OGL y-grows-up) and the same content semantics (loading HTML elements as textures, buffer types.)

Longer ramble

If you want to download the O3D repo, be warned that it has copies of textures as TGAs and PSDs. And 3DS Max files and mp3s... The repo is 1.6 GB in size. And it doesn't build. So I can't test it. All the following is gleaned from the documentation and guessed from the source code.

Anyhow, I like it. It does shaders, imports data from 3D programs and clearly has a clue. The downsides are that it's quite complex and weighs in at around 60 thousand lines of source code (compared to nine thousand for the OpenGL canvas.) But it's a scene graph, so it's no wonder it's a lot bigger.

O3D has a solution to The Shader Problem, namely they use Nvidia's Cg for compiling the shaders and have an ANTLR parser to validate the shaders. The shading language is HLSL SM 2.0 / Cg though, but it probably works the same across hardware / OS / driver combos? I hope so at least.

O3D is a scene graph renderer. Their scene graph consists of transform trees, which contain transform nodes that contain drawable objects, and render trees, which draw transform trees.

Transform nodes are basically utility-wrapped transformation matrices.

Drawables are Shapes, which consist of primitives, each of which has a material to determine the render pass and a bunch of coordinate arrays that interact with the shaders somehow. It's too complicated for me to understand at this time of the day so I'll just paste some "Hello, Cube" example code:


  var viewInfo = o3djs.rendergraph.createBasicView(
      g_pack,
      g_client.root,
      g_client.renderGraphRoot);

  viewInfo.drawContext.projection = g_math.matrix4.perspective(
      g_math.degToRad(30), // 30 degree fov.
      g_client.width / g_client.height,
      1,                  // Near plane.
      5000);              // Far plane.
  viewInfo.drawContext.view = g_math.matrix4.lookAt([0, 1, 5],  // eye
                                            [0, 0, 0],  // target
                                            [0, 1, 0]); // up

  var redEffect = g_pack.createObject('Effect');
  var shaderString = document.getElementById('effect').value;
  redEffect.loadFromFXString(shaderString);

  var redMaterial = g_pack.createObject('Material');
  redMaterial.drawList = viewInfo.performanceDrawList;
  redMaterial.effect = redEffect;

  var cubeShape = g_pack.createObject('Shape');
  var cubePrimitive = g_pack.createObject('Primitive');
  var streamBank = g_pack.createObject('StreamBank');

  cubePrimitive.material = redMaterial;
  cubePrimitive.owner = cubeShape;
  cubePrimitive.streamBank = streamBank;

  cubePrimitive.primitiveType = g_o3d.Primitive.TRIANGLELIST;
  cubePrimitive.numberPrimitives = 12; // 12 triangles
  cubePrimitive.numberVertices = 8;    // 8 vertices in total

  var positionsBuffer = g_pack.createObject('VertexBuffer');
  var positionsField = positionsBuffer.createField('FloatField', 3);
  positionsBuffer.set(g_positionArray); // vertex buffer with cube coords

  var indexBuffer = g_pack.createObject('IndexBuffer');
  indexBuffer.set(g_indicesArray); // indices to vertex buffer
  streamBank.setVertexStream(
      g_o3d.Stream.POSITION, // semantic: This stream stores vertex positions
      0,                     // semantic index: First (and only) position stream
      positionsField,        // field: the field this stream uses.
      0);                    // start_index: How many elements to skip in the
                             //     field.
  cubePrimitive.indexBuffer = indexBuffer;

  g_cubeTransform = g_pack.createObject('Transform');
  g_cubeTransform.addShape(cubeShape);

  g_cubeTransform.parent = g_client.root;

  cubeShape.createDrawElements(g_pack, null);

2009-04-17

Another 3D canvas demo

Adapted from vlad's port of FRequency's 1k demo. I made it a bit lighter and fiddled with the lights and colors.

That demo is pretty funky, my understanding is that it uses the fragment shader to raytrace a sine wave function with two lights. For each pixel, search for the nearest depth where the function is below zero. Then find the tangent of the function at that point and dot it with the light direction to do diffuse shading. Finally add fog color depending on the depth of the point.

Why it uses the tangent instead of the normal is still fuzzy to me.

2009-04-11

GL canvas feedback

Some of the feedback I've heard on using a thin OpenGL ES 2.0 wrapper as the 3D canvas API [and my thoughts parenthesized]:

It would be better to design a new API and translate it to OpenGL / Direct3D / low-level driver calls in the browser backend. [Is it easier to write a new API and translate it to Direct3D and OpenGL than translating OpenGL to Direct3D? Plus, Windows has OpenGL support. Nothing but Windows has Direct3D support. How much work is this going to involve? What will the API be based on? Current hardware? Future hardware? CPU fallback? Shaders? Will it be a pure drawing API, a general purpose data-parallel computing API or something in between? How much complexity does it expose to the web devs? Will it be immediate-mode or a scene graph? How easy will it be to get a teapot on the screen? How easy will it be to get a per-pixel shaded teapot with a normal map, environment map, gloss map, DOF blur, soft shadows and a bloom shader on the screen? Are there any documentation and tutorials for it?]

Java3D would be a better approach than OpenGL. [Might be, want to write a spec and an implementation based on it? What shading language will you use? Remember to write a compiler from it to GLSL and HLSL.]

It's not going to run on computers older than 2 years. [It runs on a 6-year-old computer. Both on Linux and Windows. On an ATI card. I've personally tested it on Linux 32-bit, 64-bit, Windows XP, with Nvidia, ATI and software rendering.]

Having explicit delete methods for resources (textures, FBOs and VBOs) is a bad idea in a GC'd language. [Agreed, they should be freed on GC.]

glGetError is bad, errors should be thrown as JavaScript exceptions. [Agreed. The glGetError overhead is <0.2us per call, it's not going to blow your budget.]

A GLSL shader that works on one driver may fail on another. Even with the same hardware. [Agreed. We need to specify a single compatible subset of the language. So, I guess it's still going to require writing a GLSL parser? Unless there's some #version that's compatible across drivers.]

You can hang the GPU if you have shaders and they have infinite loops in them. [My tests disagree on the infinite loops -part, but it is a problem with heavy shaders (and hypothetical drivers lacking runtime sanity checks.) Getting rid of branches in shaders would probably fix the problem, but is it worth it? And I imagine you could hang the GPU by compositing a few thousand large 2D canvases as well (at least you can make the browser hang and swap a lot.)]

X3D would be better than an immediate-mode API. [X3D isn't an immediate-mode API though, it's a document format. Like SVG. It's taken SVG 8 years from a W3C recommendation to get to the point where no browser still supports the whole spec, and all have different bits that they don't support (filters, animations and fonts being the big culprits.) 2D canvas took around a year to get around (OS X Dashboard 2005, Firefox 2 2006, Opera 9 2006.) Of course X3D might be easier to implement, but I wouldn't count on it.]

Khronos membership and conformance costs are high enough to make it impossible for small operators to have their say on the working group. [Agreed, though none of the browser engine developers are exactly small. Mozilla is probably the smallest at $66M revenue, Opera is $74M, Nokia $66G, Apple $32G, Google $22G and Microsoft $60G. And Nokia, Google and Apple already are members.]

Conclusions? APIs are faster and easier to implement than document formats. Shaders are a pain, but also the best feature in hardware-accelerated drawing. Kicking up a whole new cross-platform accelerated drawing API from dust is going to be hard.

2009-04-10

Visualization of a GC pause

Frame time histogram of 150 4x4 matrix multiplications per frame at 60fps on Firefox trunk 2009-04-10.

Time advances from left to right, each pixel being 16 ms. Each black bar is a frame. The height of a black bar is the time that frame took, 1 ms / pixel.

The tall bars are what's making smooth animations difficult.

2009-04-09

Code generation and dastardly plans

Back from a week and a half of F1 and video games. Managed to port the Canvas 3D tests to the new API (unearthing a bunch of old bugs and a new one.)

I also have a couple Python scripts that generate C++ code from the GLES 2.0 headers + API modifications file + valid arguments grammar. My plan is to make it spew out better smoke tests for the API. And if I can make some Webkit-based browser build on 64-bit Linux (fat chance), adapt the generator to do the easy parts of a proof-of-concept Webkit port.

Extending the code generator to handle state tracking, array arguments and array return values would make it handle most of the API, except for the couple polymorphic functions. The rest of the non-autogened code would be WGL/CGL/GLX/OSMesa-glue, state tracking and validation, totaling some 5000 lines.

My other plans for the near future include: catching up on four weeks of math homework, writing some demos for Canvas 3D (CFD, image filters, editable video filters), writing this year's two-week game (maybe something Disgaea-like? Combos and colors), learning to use Blender, writing a small painting app with Qt.

Seven half-time projects come together to make one 3.5-time project. Math and testgen over the weekend, demos and Blender next week, two-week game on the last two weeks of the month, painting app 2 hours daily alloc. Let's see which project drops first :P

2009-04-01

And now glReadPixels works on ATI pbuffer

glReadPixels from a pbuffer didn't work on a Radeon 9600 with the Linux fglrx drivers. And it was a very strange bug. Apparently pbuffer contexts don't work on ATI fglrx + R300 if you don't first create a X window context and make it current. Here's the code for the fix in nsGLPbufferGLX::Init:


// workaround for Radeon 9600 pbuffers contexts not working without a 
// previous window context
XVisualInfo *visinfo;
int vattrib[] = { GLX_RGBA, None };
visinfo = gGLXWrap.fChooseVisual( sharedDisplay, DefaultScreen(sharedDisplay), vattrib );
workaroundCtx = gGLXWrap.fCreateContext( sharedDisplay, visinfo, NULL, GL_TRUE );
gGLXWrap.fMakeContextCurrent( 
                        sharedDisplay, 
                        DefaultRootWindow(sharedDisplay), 
                        DefaultRootWindow(sharedDisplay), 
                        workaroundCtx );
XFree(visinfo);

Now my Canvas 3D demos work on ATI + Linux, yay!

fhtr