art with code

2010-10-16

Little benchmark

Got my computer put together yesterday, and now's the time to benchmark it! On a Saturday afternoon...

Image correlation algorithm benchmark:

OpenCL
240 GBps -- Athlon II X4 640, 3GHz (12GHz aggregate), 2MB L2
85 GBps -- Core 2 Duo E6400, 2.1GHz (4.3GHz aggregate), 2MB L2
OpenMP+SSE optimized
103 GBps -- Athlon II X4
45 GBps -- Core 2 Duo
OpenMP+SSE naive
13 GBps -- Athlon II X4
5 GBps -- Core 2 Duo

Pretty much linear scaling with clock frequency in OpenCL. Both have a 3 cycle L1 latency and the algorithm is very much an L1 cache benchmark, so this isn't too surprising. The SSE version has some bandwidth / load-balancing bottleneck going on, and the naive version is pretty much a pure memory bandwidth benchmark.

No comments:

Blog Archive