art with code

2009-08-03

EU court rules 11-word snippets can violate copyright


Eleven words, eleven words.. yes.
Let's see.

/usr/share/dict/words has 98569 words. That to the eleventh is 8533827430813090265537515496031670135739632890386559769 different 11-word sentences. Each 11-word sentence takes 17 * 11 bits = 187 bits.

Which gives us a keyspace size of 2*10^44 terabytes. A bit too much.

But if we manage to reduce the freedom of degrees in the sentence through grammatic modeling and some other magic of data mining, while reducing our vocabulary to common words... Let's say 5 free words from reduced vocabulary of 1024.

1024^5 = 1125899906842624 different sentences. Each sentence now weighs 10 * 11 = 110 bits.

So, 15.5 petabytes for the keyspace size.

Assuming that we can generate this stuff as fast as the I/O system allows, we can imagine buying some machine time off the Amazon cloud to bruteforce it. If one machine can manage 50 MB/s write, and we use a cluster of a 1000 machines, the total combined write speed will be 50 GB/s. For a total time of 310000 seconds, or 86 hours, at the cost of 1000*0.1e/h.

Giving us the cost for plausible total copyright control over the English language:
8611 euros for computing time and 1.25 million euros for the 15,500 1TB hard disks to store the resulting document.

(Well, you can compress the document to less than a kilobyte by writing a program that generates every 50-bit number, but what is the legal power of that?)

And now, with our very own copy of bruteforced English, we can sue the pants off every stinking anglo and anglo-wannabe out there. COPYRIGHT BANZAI!
Post a Comment

Blog Archive

About Me

My photo

Built art installations, web sites, graphics libraries, web browsers, mobile apps, desktop apps, media player themes, many nutty prototypes, much bad code, much bad art.

Have freelanced for Verizon, Google, Mozilla, Warner Bros, Sony Pictures, Yahoo!, Microsoft, Valve Software, TDK Electronics.

Ex-Chrome Developer Relations.