art with code

2010-08-28

Folding, part 3: parallel folds

So, you can do a parallel fold if your folding function is associative.

-- Here i should be the neutral element of f, that is f i x = x.
-- E.g. 0 for + and 1 for *
parFold f i [] = i
parFold f i [x] = f i x
parFold f i xs =
l `par` r `pseq` f l r
where l = parFold f i left
r = parFold f i right
left, right = splitAt (length xs `div` 2) xs

-- See how associative functions work right with it
parFold (+) 0 [1..10] == foldl (+) 0 [1..10]
parFold (*) 1 [1..10] == foldl (*) 1 [1..10]

-- But others have problems
parFold (-) 0 [1..10] /= foldl (-) 0 [1..10]
parFold (/) 1 [1..10] /= foldl (/) 0 [1..10]

-- As - and / are defined as A + -B and A * 1/B,
-- we can work around these instances
parFold (+) 0 (map negate [1..10]) == foldl (-) 0 [1..10]
parFold (*) 1 (map recip [1..10]) == foldl (/) 1 [1..10]

But for the general case, we need something stronger. Roll in the reduceFold! It splits the foldable list into sublists, folds each sublist, and reduces the subresults into a single value.

reduceFold reduce f init [] = init
reduceFold reduce f init [x] = f init x
reduceFold reduce f init xs =
l `par` r `pseq` reduce l r
where l = reduceFold reduce f init left
r = reduceFold reduce f init right
(left, right) = splitAt (length xs `div` 2) xs

-- Associative operations work as usual
reduceFold (+) (+) 0 [1..10] == foldl (+) 0 [1..10]
reduceFold (*) (*) 1 [1..10] == foldl (*) 1 [1..10]

-- And we can now combine the subresults in a different fashion
reduceFold (+) (-) 0 [1..10] == foldl (-) 0 [1..10]
reduceFold (*) (/) 1 [1..10] == foldl (/) 1 [1..10]

Let's write some parallel list operations in terms of reduceFold.

parMap f lst = reduceFold (++) (\l i -> l ++ [f i]) [] lst
parFilter f lst = reduceFold (++) (\l i -> if f i then l++[i] else l) [] lst
parConcat lsts = reduceFold (++) (++) [] lsts
parConcatMap f lst = reduceFold (++) (\l i -> l++(f i)) [] lst

optLeft (Just x) _ = Just x
optLeft Nothing (Just x) = Just x
optLeft Nothing Nothing = Nothing

optFilter f i = if f i then Just i else Nothing

parFind f lst = reduceFold optLeft (\l i ->
optLeft l (optFilter f i)) Nothing lst

optRight x y = optLeft y x

parFindLast f lst = reduceFold optRight (\l i ->
optRight l (optFilter f i)) Nothing lst

-- Yeah, they pretty much work
divisibleBy n x = x `mod` n == 0

parMap (+ 10) [1..10] == [11..20]
parFilter (divisibleBy 3) [1..15] == [3,6,9,12,15]
parConcat [[1], [2,3,4], [5,6,7], [8,9], [10..20]] == [1..20]
parConcatMap (\x -> [x,x,x]) [1..3] == [1,1,1,2,2,2,3,3,3]
parFind (divisibleBy 127) [90,93..900] == Just 381
parFind (> 10) [1..10] == Nothing
parFindLast (divisibleBy 127) [90,93..900] == Just 762

You might've noticed that both parFold and reduceFold are creating a parallel evaluation spark for each element in the lists. This is not a very good idea as the sparking overhead is often going to dominate the execution time and kill performance.

So we need some way to control the reduce tree branching factor and clump up the leaf-level folds into longer operations. I don't know if I can manage to come up with an automagic or even semi-automagic way to do this. I would like to use yesterday's theoretical math as guidance for the tree shape optimizer. Maybe we'll find out in the next installment.

No comments:

Blog Archive