art with code

2008-08-31

prelude.ml: some parallel combinators

Thanks to mfp pasting me Jon Harrop's(?) invoke-function, prelude.ml now has pmap.

How about drawing some fractals with it?

mandelbrot.ml:

open Prelude

let niter = 255
let limit = 4.0

let iter_z cr ci =
let rec aux i zr zi = if i >= niter then 0 else
let nzr = zr *. zr -. zi *. zi +. cr
and nzi = 2.0 *. zr *. zi +. ci in
if nzr *. nzr +. nzi *. nzi > limit then 255-i
else aux (i+1) nzr nzi in
aux 0 0.0 0.0

let draw_row w h y =
sinit (fun x ->
let cr = 2.0 *. float x /. float w -. 1.5 in
let ci = 2.0 *. float y /. float h -. 1.0 in
chr (iter_z cr ci)
) w

let square_fractal process_count d =
let header = sprintf "P5 %i %i 255\n" d d in
header ^ (join "" @@ pmap ~process_count (draw_row d d) (1--d))

let () =
let img = square_fractal (parseInt Sys.argv.(1)) (parseInt Sys.argv.(2)) in
writeFile Sys.argv.(3) img


Compile with ocamlfind opt -package unix,pcre -linkpkg -o mandelbrot prelude.ml mandelbrot.ml

And marvel at the speedup:

$ time ./mandelbrot 1 1000 test.pgm

real 0m1.617s
user 0m1.596s
sys 0m0.024s

$ time ./mandelbrot 2 1000 test2.pgm

real 0m0.843s
user 0m1.536s
sys 0m0.012s

$ diff test.pgm test2.pgm && echo No difference
No difference


[edit]
Stepping up the abstraction ladder: par_sinit does parallel string init (string init = allocate string, fill it by mapping over its indices), so we can write square_fractal in terms of string creation. If you're familiar with GLSL fragment shaders, this should ring some bells.

let draw_fractal ?process_count w h =
par_sinit ?process_count (fun i ->
let y, x = quot_rem i w in
let cr = 2.0 *. float x /. float w -. 1.5 in
let ci = 2.0 *. float y /. float h -. 1. in
chr (iter_z cr ci)
) (w*h)

let square_fractal process_count d =
let header = sprintf "P5 %i %i 255\n" d d in
header ^ (draw_fractal ~process_count d d)


The GLSL version of draw_fractal's kernel would look something like this (dropped niter to 50 to make it run at 50fps instead of 10fps):

#version 120

uniform float w;
uniform float h;

vec4 iter_z(float cr, float ci) {
float nzr, nzi, zr = 0.0, zi = 0.0;
vec4 color = vec4(0.0);
for (int i=0; i<50; i++) {
nzr = zr * zr - zi * zi + cr;
nzi = 2.0 * zr * zi + ci;
if (nzr * nzr + nzi * nzi > 4.0) {
color = vec4(1.0 - (i / 50.0));
break;
}
zr = nzr;
zi = nzi;
}
color.a = 1.0;
return color;
}

void main(void) {
float x = gl_TexCoord[0].s;
float y = gl_TexCoord[0].t;
float cr = 2.0 * x / w - 1.5;
float ci = 2.0 * y / h - 1.0;
gl_FragColor = iter_z(cr, ci);
}

2008-08-30

Compiling upwards

This is a bit dense, sorry about that. If something needs clarification, please tell.

A compiler translates programming language into machine language, a programmer translates ideas into programming language. As a compiler may first compile into an intermediate language, so may a programmer. A programmer takes an idea, finds the mathematics of it, the algorithms of the mathematics and translates the algorithms into statements in the programming language.

For want of a more concrete explanation, consider the following:
"I want to know which of my pages are popular" (The Need)
-> "Show the top ten visited pages" (UI Design - how to fulfill the need)
-> "Page is URL, visited is count of page instances in log, top ten visited is the ten first pages when they are sorted by visited, show is print." (Mathematics - theoretical basis)
-> "Scan log line by line, insert/increment each page into a min-heap, pop ten when done, print each." (Algorithm - theoretical implementation)
-> "recurseLines (fun h l -> Heap.incr h (getURL l)) (Heap.empty min) log |> Heap.take 10 |> iter (puts . showInt)" (Programming - concrete implementation)

The closer a programming language is to the algorithmic language of a programmer, the easier it is to write programs in it. The closer the programming language is to the machine language, the easier it is to compile. Note that these are independent of each other, it is possible to have a programming language that is close to both the algorithmic language and to the machine language - consider a programmer whose algorithmic language is the machine language: now the two are the same. However, usually this not the case, and a language divides its attention between the two goals.

For measuring the productivity of a programming language, a[n overly] simple quantative measure is to count the amount of tokens and characters required to write a program. By optimizing the character count of each token (by e.g. Huffman coding), the amount of mechanical typing can be minimized. But this is a form of micro-optimization, and should not be the only programming optimization: there are larger gains to be had in optimizing the structure of programming (and thus the token count.)

Token count optimization works by abstracting common patterns in the program, it is like seeking madlibs templates in the source code: recursion, iteration, splits, joins, scans. A hypothetical algorithm for doing token count optimization would find common partial trees in the AST and replace them with higher-order functions.

A keen-minded reader may have noticed the similarities between optimizing towards the algorithmic language and optimizing towards the machine language: both aim to minimize the work required. Optimizing towards the machine language seeks to execute the program in the least amount of time, optimizing towards the algorithmic language seeks to write the program in the least amount of time.

OCaml function usage stats

I.e. how many times a function name was written in all the .ml files on my system.
What you can use it for is optimizing the interface: make much-used names short, try to replace often-used code patterns with combinators.

E.g. the high scores for Array.unsafe_get and Array.length might mean that the idiom "for i = 0 to Array.length a - 1 do something (Array.unsafe_get a i); done" is common, while the relatively low scores for Array.map and Array.unsafe_set would suggest that the above iteration isn't really used as a map. So one could micro-optimize by making Array.unsafe_get shorter to write, or try to solve the root problem and find a better abstraction for iterating over arrays.


.() 2411
.[] 878
.() <- 599
.[] <- 232

Array.length 911
Array.unsafe_get 235
Array.create 213
Array.init 204
Array.make 161
Array.of_list 137
Array.map 136
Array.to_list 117
Array.unsafe_set 82
Array.sub 73

String.length 1240
String.sub 490
String.create 294
String.concat 287
String.make 168
String.blit 135
String.capitalize 52
String.unsafe_set 51
String.lowercase 51
String.unsafe_get 47

Filename.concat 171
Filename.check_suffix 89
Filename.basename 78
Filename.quote 56
Filename.dirname 54
Filename.chop_suffix 26
Filename.temp_file 22
Filename.current_dir_name 16
Filename.chop_extension 15
Filename.is_relative 14

List.iter 1538
List.map 1253
List.fold_left 436
List.rev 420
List.length 371
List.mem 248
List.fold_right 227
List.assoc 152
List.exists 131
List.filter 121

Str.regexp 92
Str.split 27
Str.global_replace 24
Str.string_match 19
Str.search_forward 12
Str.global_substitute 9
Str.matched_string 8
Str.matched_group 7
Str.regexp_string 6
Str.match_end 5

Sys.argv 170
Sys.time 75
Sys.file_exists 68
Sys.word_size 64
Sys.remove 51
Sys.getenv 42
Sys.max_string_length 41
Sys.os_type 30
Sys.max_array_length 27
Sys.set_signal 22

Unix.close 103
Unix.gettimeofday 97
Unix.time 69
Unix.openfile 64
Unix.file_descr 47
Unix.stat 31
Unix.write 21
Unix.read 20
Unix.getcwd 18
Unix.closedir 18


How-to:

def mk_hash; Hash.new{|h,k|h[k]=0}; end
hashes = %W(Sys Filename List Array String Str Unix).map{|n| [n, mk_hash] }

index_access = mk_hash

ruby_files = `locate '*.ml'`.split("\n").grep(/ml$/)
ruby_files.each{|f|
d = File.read(f)
hashes.each{|name, hash|
d.scan(/\b#{name}[\.:]+[a-z0-9_]+/).each{|k|
hash[k.split(/[\.:]+/).join(".")] += 1
}
}
s1 = d.scan(/\.\([^\)]+\)\s*<-/).size
s2 = d.scan(/\.\[[^\]]+\]\s*<-/).size
index_access[".()"] += d.scan(/\.\([^\)]+\)/).size - s1
index_access[".[]"] += d.scan(/\.\[[^\]]+\]/).size - s2
index_access[".() <-"] += s1
index_access[".[] <-"] += s2
}
(hashes + [["index_access", index_access]]).each do |n, h|
File.open(n+".txt", "w"){|f|
f.puts h.to_a.sort_by{|k,v| -v}.map{|k,v| k.ljust(25) + " " + v.to_s }
}
end

2008-08-27

Most-used file methods in Ruby (plus some stats on arrays, strings and hashes)

I.e. how many times a method was written in all the Ruby files on my system.

Results:

File.join 4038
File.dirname 2850
File.basename 2399
File.open 2385
File.exist 2020
File.expand_path 1899
File.read 1039
File.dir 1024
File.file 786
File.unlink 574

FileUtils.mkdir_p 368
FileUtils.rm_rf 185
FileUtils.mv 85
FileUtils.rm_r 77
FileUtils.touch 57
FileUtils.cp 56
FileUtils.ln_s 49
FileUtils.cp_r 47
FileUtils.mkdir 44
FileUtils.verbose_flag 40

f.read 1231
f.puts 707
f.write 689
f.flock 302
f.seek 293
f.name 264
f.pos 162
f.gets 134
f.stat 117
f.close 102


How about s[trings], a[rrays] and h[ashtables]?

s.to_s 349
s.scan 279
s.split 222
s.empty 170
s.rb 152
s.last 124
s.gsub 120
s.nil 120
s.size 118
s.to_f 91

a.size 186
a.to_s 175
a.include 151
a.slice 130
a.each 125
a.inspect 108
a.token_stream 97
a.class 92
a.b 89
a.kind_of 85

h.delete 382
h.clone 127
h.keys 118
h.each 109
h.to_f 99
h.merge 93
h.empty 47
h.values 45
h.to_json 42
h.call 41


How-to:

def mk_hash; Hash.new{|h,k|h[k]=0}; end
hashes = %W(File FileUtils f s a h).map{|n| [n, mk_hash] }

ruby_files = `locate '*.rb'`.split("\n")
ruby_files.each{|f|
d = File.read(f)
hashes.each{|name, hash|
d.scan(/\b#{name}[\.:]+[a-z0-9_]+/).each{|k|
hash[k.split(/[\.:]+/).join(".")] += 1
}
}
}
hashes.each do |n, h|
File.open(n+".txt", "w"){|f|
f.puts h.to_a.sort_by{|k,v| -v}.map{|k,v| k.ljust(25) + " " + v.to_s }
}
end

2008-08-24

Prelude.ml

Prelude.ml - my OCaml standard library replacement, started off as a partial port of Haskell's Prelude (which you should read btw, it's very nice) and evolved into something more extensive. It doesn't have the lazy lists -focus of Haskell's, but does things the usual way instead.

The things I tried to make convenient with prelude.ml are functions, lists, strings, type conversions, simple I/O and shell commands.

Here are some examples:


(* Quick vocabulary:
let ($) x f = f x, i.e. it's the reverse of Haskell's $ and the same as |>
[1; 2; 3] $ map (add 3) = [4; 5; 6]

let (@@) f x = f x = Haskell's ($)
map (add 3) @@ [1; 2; 3] = [4; 5; 6]

let (@.) f g x = f (g x) = Haskell's (.)
(add 5 @. multiply 2) 10 = 25

let ($.) f g x = g (f x), the suffix version of (@.)
(add 5 $. multiply 2) 10 = 30
*)

(* let fupler f x = (x, f x) *)
# unfoldr (lte 3) (fupler succ) 1;;
- : int list = [3; 2; 1]

# generateR (lte 3) succ 1;;
- : int list = [3; 2; 1]

# generate (greaterThan 5) pred 10;;
- : int list = [10; 9; 8; 7; 6]

# (10--6);;
- : int list = [10; 9; 8; 7; 6]

(* average file size *)
# readCmd ["ls"; "-l"] $ lines $ tail $ map (words $. nth 4 $. parseInt) $ average;;
- : int = 2920660

(* grepping with scan *)
# readRawCmd "ls /etc/apache2/mods-enabled" $ scan_nth "\\S*python\\S*" 0;;
- : string list = ["mod_python.load"]

(* grepping with filter *)
# ls "/etc/apache2/mods-enabled" $ filter (xmatch "deflate|alias");;
- : string list = ["alias.conf"; "alias.load"; "deflate.conf"; "deflate.load"]

(* most recently modified file *)
# ls "." $ filter isFile $ maximumBy mtime;;
- : string = ".xsession-errors"

(* and with the mtime *)
# ls "." $ filter isFile $ maximumByWith mtime;;
- : string * float = (".xsession-errors", 1220098201.)

(* writing to a file *)
# writeFile "thousand.txt" ( (1--1000) $ map showInt $ join "\n" );;
- : unit = ()

(* file size *)
# fileSize "thousand.txt";;
- : int = 3892

(* reading from a file *)
# readFile "thousand.txt" $ lines $ map parseInt = (1--1000);;
- : bool = true

(* line-wise read *)
# mapLines parseInt "thousand.txt" $ sum;;
- : int = 500500

(* or as an unfold *)
# withFile "thousand.txt" @@ unfoldrOpt (optEOF (fun ic -> readInt ic, ic)) $ sum;;
- : int = 500500

(* let fuple f g x = (f x, g x) *)
# withFile "thousand.txt" @@ unfoldrOpt (optEOF (fuple readInt id)) $ sum;;
- : int = 500500

(* let fuplel f x = (f x, x) *)
# withFile "thousand.txt" @@ unfoldrOpt (optEOF (fuplel readInt)) $ sum;;
- : int = 500500

(* or with tokenizeFile *)
# tokenizeFile readInt "thousand.txt" $ sum;;
- : int = 500500

(* and the idiomatic OCaml way *)
let file = open_in "thousand.txt" in
let rec aux ic sum =
let i = try Some (int_of_string (input_line ic)) with End_of_file -> None in
match i with None -> sum | Some x -> aux ic (x + sum) in
try
let sum = aux file 0 in
( try close_in file with _ -> () );
sum
with e ->
( try close_in file with _ -> () );
raise e
;;
- : int = 500500

(* take 10 . lines =<< readFile "thousand.txt" *)
# tokenizeFileN readLine 10 "thousand.txt";;
- : string list = ["1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9"; "10"]

(* ten-number group averages for the first hundred numbers *)
# tokenizeFileN readFloat 100 "thousand.txt" $ groupsOf 10 $ map averagef;;
- : float list = [5.5; 15.5; 25.5; 35.5; 45.5; 55.5; 65.5; 75.5; 85.5; 95.5]

(* sliding ten-number averages for the first twenty numbers *)
# tokenizeFileN readFloat 20 "thousand.txt" $ mapWindow averagef 10;;
- : float list = [5.5; 6.5; 7.5; 8.5; 9.5; 10.5; 11.5; 12.5; 13.5; 14.5; 15.5]

(* interaction *)
# interact (uppercase @. join "-" @. words);;
hi there
HI-THERE
my name is robby
MY-NAME-IS-ROBBY
i am a robot
I-AM-A-ROBOT
^D
- : unit = ()

What's lacking?
  • I haven't measured performance.
  • It would be nice to have simple socket IO to do network programming.
  • Stream-like things aren't handled like stream-like things: it's a pain to do something like mapWindow that would work for e.g. `tail -f`
  • Lists suck. Arrays are better from hardware POV, but harder to reason about. List-like interface to arrays, maybe? Or ropes of some kind?
  • The OCaml compiler doesn't do beta reduction, so combinators are slow. Could be worked around by turning them into macros.

Blog Archive