art with code

2010-05-21

Parsing tarballs with JavaScript

Update: Check out this augmented version that streams gzipped tarballs.

Here's a small piece of JavaScript to parse tarballs and my custom JSON packfiles. There be four demos as well: loading files from a tar, streaming images from a tar, loading files from a JSON packfile and streaming files from a JSON packfile.

The part that converts images to date URLs is a bit slower than it could be, as it has to strip high bytes off the characters. The upcoming JS File and Blob APIs for binary data handling should help there. Though if you have less than a hundred kB of images, I don't think you'll even notice the delay. Even half a meg of stuff unpacks in a fraction of second on my slow laptop (Pentium M 1.7GHz). If you do need speed, you can convert the images to data URIs beforehand.

Quickly estimating, it'd take something like fifteen seconds to load up a hundred megs of models and textures on my laptop, maybe around 5 s on a decent computer. Doing the initial archive parsing pass would take maybe a second for a hundred meg archive. If that's too slow for you, I want your internet connection. If the hundred meg tarball is split 1:4 geometry:textures, where the geometry takes 20 bytes per tri and the textures are 10x compressed JPEGs, it'd have 1 Mtri geometry and 240 Mpx textures.

The script doesn't handle gzip or any other compression, use gzip-encoding on the server for that. The tar file format is pretty simple: it's based on 512-byte blocks and each file begins with a 512-byte header, followed by the file data padded up to a multiple of 512 bytes. The numbers are represented as octal ASCII (though there is a GNU tar extension that uses binary ints for handling files bigger than 8 GB, which my script doesn't support).

My JSON packfile format consists of a one-line JSON header array of {filename : string, offset : bytes, length : bytes} followed by a newline and the concatenated file contents. Easy to create and parse.

Edit: Added streaming using xhr.readyState == 3 checks. It might cause some stuttering on the page when dataURLing the images, though it should be quite efficient otherwise. Optimizations welcome :)
Post a Comment

Blog Archive

About Me

My photo

Built art installations, web sites, graphics libraries, web browsers, mobile apps, desktop apps, media player themes, many nutty prototypes, much bad code, much bad art.

Have freelanced for Verizon, Google, Mozilla, Warner Bros, Sony Pictures, Yahoo!, Microsoft, Valve Software, TDK Electronics.

Ex-Chrome Developer Relations.