art with code

2008-09-06

Constant-space parallel combinators in OCaml

Updated prelude.ml with a couple new parallel iterators: pfoldl, pfoldlSeqN and piterSeqN. The pfoldl function is a two-phase foldl, it splits the list into process_count pieces, foldls each and foldls the results. To convey that, it takes two functions : the normal fold function and the reduce function.

The *SeqN functions are partially sequential, they synchronize after every N elements. To use the Mandelbrot example, if you're rendering a 64000 x 64000 image, it would take more than 4 gigs of RAM to do it with pbinit. For a sequential renderer that is no problem, since you can output each byte after it's computed. A parallel renderer can't really use that strategy, since the pixels are computed in parallel and outputting them as they come would lead to an unsorted image.


Sequential output:
# iter (fun i -> sleep (3-i); puts (showInt i)) (0--3);;
0
1
2
3
- : unit = ()

Parallel output:
# piter ~process_count:4 (fun i -> sleep (3-i); puts (showInt i)) (0--3);;
3
2
1
0
- : unit = ()


What we can do is turn the problem into a sequential iteration of internally parallel blocks. E.g. render a 1MB block of pixels in parallel, print it out before doing the next block. This way memory use is limited to block size and we still get a decent parallelization speedup.


Partially sequential iteration:
# iter (fun l -> iter puts @@ pmap (fun i -> sleep (3-i); showInt i) l) (groupsOf 2 (0--3));;
0
1
2
3

Or using piterSeqN:
# piterSeqN 2 puts (fun i -> sleep (3-i); showInt i) (0--3);;
0
1
2
3


Here's a larger example, using the already familiar Mandelbrot renderer. Starting with a sequential version:

open Prelude

let niter = 64
let limit = 4.0

let mandelbrot z x y =
let cr = 2.0 *. float x /. z -. 1.5 in
let ci = 2.0 *. float y /. z -. 1.0 in
let rec aux i zr zi = if i >= niter then 0 else
let nzr = zr *. zr -. zi *. zi +. cr
and nzi = 2.0 *. zr *. zi +. ci in
if nzr *. nzr +. nzi *. nzi > limit
then int (ceil (255. *. (float (niter-i) /. float niter)))
else aux (i+1) nzr nzi in
aux 0 0.0 0.0


(*vvv Interesting part starts here vvv*)

let draw_fractal z w h =
for y = 0 to h-1 do
for x = 0 to w-1 do
output_byte stdout (mandelbrot z x y)
done
done

let print_fractal w h =
puts (sprintf "P5 %i %i 255" w h);
draw_fractal (float (min w h)) w h

let () = print_fractal (parseInt Sys.argv.(1)) (parseInt Sys.argv.(1))


To make it parallel, we make draw_fractal create all scanlines and print them out:


let draw_scanline z w y =
binit (fun x -> mandelbrot z x y) w

let draw_fractal z w h =
let scanlines = pmap (fun y -> draw_scanline z w y) (0--(h-1)) in
iter putStr scanlines


And to make it constant-space parallel (well, O(width*256)), let's process in blocks of 256 scanlines:


let draw_fractal z w h =
piterSeqN 256 putStr (fun y -> draw_scanline z w y) (0--(h-1))


Done! Let's put it all together one more time:


open Prelude

let niter = 64
let limit = 4.0

let mandelbrot z x y =
let cr = 2.0 *. float x /. z -. 1.5 in
let ci = 2.0 *. float y /. z -. 1.0 in
let rec aux i zr zi = if i >= niter then 0 else
let nzr = zr *. zr -. zi *. zi +. cr
and nzi = 2.0 *. zr *. zi +. ci in
if nzr *. nzr +. nzi *. nzi > limit
then int (ceil (255. *. (float (niter-i) /. float niter)))
else aux (i+1) nzr nzi in
aux 0 0.0 0.0

let draw_scanline z w y =
binit (fun x -> mandelbrot z x y) w

let draw_fractal z w h =
piterSeqN 256 putStr (fun y -> draw_scanline z w y) (0--(h-1))

let print_fractal w h =
puts (sprintf "P5 %i %i 255" w h);
draw_fractal (float (min w h)) w h

let () = print_fractal (parseInt Sys.argv.(1)) (parseInt Sys.argv.(1))


We replaced five lines of sequential code with three lines of parallel code and got a version that scales nearly linearly with the number of cores. Not too bad, eh?

No comments:

Blog Archive