art with code


Three thoughts


We're living on a grain of sand, five meters away from a light bulb. Of all the solar energy falling onto our tiny grain of sand, we are currently able to utilize 0.01%. The total energy falling onto our tiny grain of sand is roughly 0.00000005% of the total output of the light bulb.


Gravity sorting. In a gravity well, denser particles end up below lighter particles given time. Here's a list of densities of common elements. It matches the distribution of elements on Earth: air above water, water above silicon, silicon above iron. Note that the rare elements tend to be at the bottom of the list. There are three reasons for that. 1) Heavy elements are rare in the Universe. You need supernovas to produce them. 2) Heavy elements get gravity sorted below iron and mostly reside in the middle of the planet. 3) Some elements - like gold - don't oxidize easily, so they don't bubble up as lighter oxides. For a counter-example, tungsten has a similar density as gold (19.3 g/cm3), but oxidizes as wolframite (Fe,Mn)WO4, which has a density of 7.5 g/cm3 (close to elemental iron). As a result, the annual global tungsten production is 75,000 tons, whereas the annual gold production is 3,100 tons.

Elemental abundances from Wikipedia

The Earth formed from dust in the protoplanetary disk, similarly to other planets and objects in the Solar System. As a result, asteroids should have similar overall composition as the Earth.

Gravity sorting acts slower on less massive objects. You should see relatively more heavy elements at the surface of less massive objects. The force of gravity is also weaker, making the grains of matter less densely packed. This should make mining asteroids less energy-intensive compared to mining on Earth. Asteroid ores should also be more abundant in metals compared to Earth ores. At the extreme end, you'd have iridium, which is 500x more common on asteroids compared to the Earth's crust. Couple that with - let's say - factor of two energy reduction compared to Earth mining, and an asteroid mine could generate 1000x the iridium per watt compared to an Earth-based operation.

Suppose you want to have reusable space launch vehicles to minimize launch costs. Rockets that land. Your customers pay you to launch their stuff to space, then you recover your rocket to save on costs. You don't want to recover the stage that gets to LEO because it'd be too expensive. But what if the returning LEO vehicle brought back a ton of gold. That'd be worth $40 million at today's prices. And the amount of gold brought back this way would be small enough (say, 25 tons per year) that it wouldn't put much dent into gold demand or affect the price of gold. If anything, you could sell it as Star Gold for jewelry at 10x the spot price. Even at spot price, it'd go a long way towards covering the cost of the entire launch.


In a few decades, it looks likely that we'll be able to simulate humans in real-time using a computer the size of a match box. Match box humans. If it takes half a square meter of solar panel to power one match box human, you could sustain a population of 2 trillion in the US alone. If you were to use fusion to convert all the cosmic dust that falls on Earth (let's say, 100 tons) to energy at 1% mass-to-energy ratio, it would generate around 1 PW of energy, which could sustain 10 trillion match box humans running at 100 watts per person.


The carbon capture gambit

To control the climate of the planet, stopping the loss of carbon to air is not enough. To gain control of the temperature of the planet and the carbon economy, we need to mine the carbon that's been lost into the air.

There are proposed air-mining mechanisms for carbon that run at a cost of 3-7% of GDP. And that's before significant optimizations. To gain control of the climate and amass vast carbon reserves, we would only need a sum of money equivalent to a few years' worth of economic growth. If we invest now, we'll get a massive head start on future mining operations by other countries, and will end up in an immensely powerful geopolitical position. Not only will we have control over the climate of the planet, we also end up in the possession of all the carbon lost into the air.

Control over carbon will give us control over the carbon-based fuel economy. Most of the carbon accessible carbon on the planet is right there in the air, ready for the taking.

Every year, more than 7 gigatons of carbon is lost in the atmosphere in the form of CO2. We can estimate the market value of this lost carbon using the price of coal. Coal is around 50% carbon, with the rest composed of impurities. The price of coal is around $40 per ton. Pure carbon could be twice as valuable as coal. At $80 per ton, 7 gigatons of carbon would be worth $560 billion dollars. More than half a trillion dollars. Every. Single. Year.


Crank, the SUBSET-SUM edition

SUBSET-SUM: 2^n possible answers for n-element input. Input composed of n k-bit numbers, find a subset that sums up to a target number. Total input length is n*k bits. If n is log(k), solvable in polynomial time to n*k (2^n ~= k ~= n*k). If k is log2(n), also solvable in polynomial time (2^k unique numbers ~= n => you've got all the numbers, greedy solution always works). Therefore k = p(n).

If you pre-sort the numbers, you get a partially sorted binary enumeration. E.g. If you have 4 numbers in the input, you can state your answers as binary numbers from 0001 to 1111. Because the numbers are sorted, you know that 0100 > 0010, regardless of the numbers involved.

In addition to ordered bit patterns, the output space is composed on ambiguous bit patterns. Think of 0110 and 1001. You can't tell which one is larger without evaluating the sum in some way.

The kickers are that for any given bit pattern, you can generate an ordered pattern that is smaller or larger to it. And the number of ambiguous bit patterns between the two ordered patterns grows exponentially to n. Indeed, the number of ambiguous bit patterns for a given bit pattern grows exponentially with n as well.

Think of this 24-bit pattern: 000000111111111111000000. It's got the middle 12 bits set, surrounded by 6 zero bits on either side. Because of the ordered pattern properties (if you move a bit up, the sum is larger, if you move a bit down, the sum is smaller), you know that any bit pattern with 12 bits set where all the bits are below the 6th bit are equal or smaller to the mid-bits-set pattern. Likewise, any bit pattern with zeroes above the 18th bit are equal to or larger to the mid-bits-set pattern. The bit patterns ambiguous to the mid-bits-set pattern are ones where you have bits moved to above and below the pattern. This would constitute around 212 patterns, or 2n/2.

You might think that this is pretty nice to use for solution search, but note that the ordered pattern comparison only decreases the size of the output space under consideration by 2-n/4. With 24 bits, this comparison would reduce your output space to 63/64 of the original. With 28 bits, 127/128. With 30 bits, 255/256. And so on.

Even if you manage to use some fancy combination of comparisons and pattern recognition to rule out that portion of the search space with each comparison, you're still running at 2n/4 complexity.

Digression: The SUBSET-SUM search graph is composed of n layers. If you can solve one layer in p(n) time, you can solve the problem in p(n) time (because n*p(n) ~ p(n)). And of course, if you can't solve a layer in p(n), you can't solve the full problem either. This doesn't really help much, since the number of sums in the middle layer grows exponentially with n, and the number of unique sums grows exponentially with k. (If you think of the middle layer as a line with a little tick mark for every number generated on it, adding one to n up-to-doubles the number of tick marks on the line. Adding one to k doubles the length of the line. Adding one to n increases the size of the input by k bits. Adding one to k increases the size of the input by n bits.)

Digression 2: You can easily generate a number that's got the maximum distance from the target number equal to max(input), just sum them up until you reach a number that's higher than the target. Of course, max(input) is around 2k. I believe you could also generate a number that has a distance around the difference between two numbers in the input (greedily swap numbers to move the generated number towards the target). The difference between two adjacent input numbers is around 2k/n.

Digression 3: Even if you could get to p(n) distance from the target, you'd have to then iterate towards the target. You can generate an ordered pattern in the right direction, but that'll jump you an exponential distance. And after you've jumped to the closest ordered pattern, you need to find the next largest number for it. And finding the next largest number for a given bit pattern is tough. You need to resolve the ambiguous patterns for it. Which are exponential in count. And where every evaluation of a pattern yields you a portion of the solution that shrinks exponentially with n. If so, finding the next largest number to a pattern sounds like it's harder than NP (how would you verify the adjacency in polynomial time?)

Digression 4: Let me propose a series of simplified problems for SUBSET-SUM. First, let's limit ourselves to problems where a solution exists, and where exactly half of the numbers are used for the generated number. The first simplified problem is finding an adjacent number for such a generated number. The second is finding a number that's p(n) distant from a generated number. The third is finding the actual generated number. I don't think any of these can be solved in p(n)...


Even more crank on P != NP

Size limits for the input and the output

If the input is exponentially smaller than the output, the problem is not in NP. If the input is power-of-two exponentially larger than the output, the problem is in P.

In the first case, the output is exponentially longer than the input and therefore takes exponentially longer to print out, even on a non-deterministic machine. Even if you could print it out instantly, you need to check the output with a deterministic verifier that needs to run in polynomial time to the original input, and reading the exponentially larger output into the verifier would take exponential time to the original input. So exponentially longer outputs are not in NP.

In the second case, if you have a polynomial time deterministic verification algorithm, you can create a solver by adding an output space enumerator in front of it. If the output is exponentially smaller than the input, hey, you've got a polynomial time solver. If |output| = log2(|input|), enumerating the outputs takes 2|output| = 2log2(|input|) = |input| steps. From NP, you have the verifier that runs in p(|input|) so a solver that enumerates the output space and tries the verifier on each possible solution would run in 2|output| * p(|input|), which becomes |input| * p(|input|) if the output is exponentially smaller than the input like above. (So any problems in NP where the output is a single bit are also in P.)

Due to these size limits, for P ?= NP the size of the output must be polynomial(-ish??) to the size of the input.

Size limits for the solver

How about the size of the deterministic solver relative to the input. If the deterministic solver machine is by necessity exponentially larger than the input (by necessity meaning that for any given part of the machine there exists an input that makes the machine access that part), there also exists an input that causes the machine to move the tape a distance exponential to the size of the input, which takes exponential time relative to the input. So the deterministic solver needs to have a size at most polynomial to the size of the input.

For the lower bound, consider the case where the machine extracts zero information from the input. This machine just enumerates over the output space, checking each answer using the verifier. The runtime of the machine is 2|output| * p(|input|), the code is fixed in size, and it uses |output| extra tape as memory. At the other end is the answer table machine: it's an array of precomputed solutions where the key is the input. The size of the answer table machine is |output| * 2|input| (and due to the above tape movement limitations, will take exponential time to run relative to some input.)

Knowing nothing about the input is out, and knowing everything about the input is out. Every bit in the program encodes information about the input-output-relation, allowing the solution search to proceed faster than brute-force enumeration when the program encounters an input that matches the encoded information. The program can also learn from the input, but it can't learn an exponential amount of information from the input (because that'd take exponential time.) Wild guess: If the amount of information used by the program is exponentially less than the amount of information in the input-output-relation, the program would end up using brute-force search over parts of the output that are polynomial in size to the input, which would take exponential time to the input. If true, there would be no sub-polynomial-space programs for P ?= NP that run in polynomial time.

Given the existence of the deterministic polynomial time verifier, we know that you can extract information from the input-output-pair in polynomial time, even though the amount of information extracted is very small. How small? Well, at least you get |output| / 2|output| bits of info. After trying every 2|output| outputs, you would have |output| bits and a correct output. If there are several correct outputs, you'd gain numberOfOutputs * |output| / 2|output| bits. If the numberOfOutputs grows exponentially with the output size, this would give you a polynomial time solver. But, if this were the case, someone would've noticed it. Another wild guess: the amount of information we get out of the verifier shrinks exponentially in relation to the output size (and therefore the input size, which has a polynomial size relation to the output).

The number of input enumerations is 2|input|, ditto for the output 2p(|input|). We have a deterministic polynomial time verifier that can extract solution information from the input, but the amount of information extracted looks to be a logarithm of the input size. We also have a hypothetical solver program that encodes a polynomially growing amount of information about the input-output-relation, allowing it to avoid brute-force search whenever it encounters an input that matches the encoded information.

If the number of patterns that can be encoded in the input grows exponentially with the input length, then as the problem size grows towards infinity the question becomes: can you discern between an infinite number of input patterns with a zero-length solver program and a verifier that can extract zero bits of information from the input? For P to be NP, the number of patterns in the input needs to grow as a polynomial to the length of the input.


Some more rubbish on P != NP

Feeling crank today, so:

Non-deterministic machines are one-bit oracles. Given a branch in execution, they always follow the correct path. You can use this property to write a universal non-deterministic machine:


NonDeterministic done, correctOutput;

while (!done) {

Its execution time is linear to the size of the output. There are still problems that take non-polynomial time compared to the input. For instance, "given an input of n bits, print out n! bits".

The question in P ?= NP then is: "given a problem where the output size is a polynomial of the input size, is it possible to emulate a non-deterministic machine using a deterministic machine with at most a polynomial slowdown". Since the ND machine takes a number of steps linear to the output size, which is polynomial of the input size, it solves the problem in polynomial time to the input. If the deterministic machine can emulate the ND machine at a polynomial slowdown, its running time is a polynomial of the output size, which is a polynomial of a polynomial of the input size.

Maybe you could also ask if it's possible to prove that there exists a problem for which an optimal program running on a deterministic machine requires a number of steps exponential to the size of its output.

Make a bridge from output size to input size? ...exponential to the size of its output and where the output size grows linearly with input size.


New CPUs

Lots of cores! More cores! AMD bringing 32-core EPYC to the desktop, Intel bringing top-end 28-core Xeons to the desktop.

In terms of compute, it's getting pretty interesting! 32 cores at 4 GHz and SIMD + ispc to SPMD it up. With 8-wide SIMD and 1 cycle FMA, you'd be looking at 2 TFLOPS. If you could push that to 16-wide (or more usefully, double the core count), 4 TFLOPS. That's discrete GPU territory. Not to forget double precision: 1 TFLOPS DP beats all but the uber-expensive compute GPUs.

If you still have that /usr/bin/directx thing, you wouldn't need a GPU at all. Just plug a display to the PCIe bus and send frames to it.

Memory bandwidth is still an issue, a few GB of HBM would help. And it'd be nice to plug in extra CPUs and RAM to PCIe slots.


WAN OpenVPN at 915 Mbps

$ iperf -c remote -P4
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   284 MBytes   238 Mbits/sec
[  3]  0.0-10.0 sec   278 MBytes   233 Mbits/sec
[  6]  0.0-10.0 sec   275 MBytes   230 Mbits/sec
[  5]  0.0-10.1 sec   259 MBytes   216 Mbits/sec
[SUM]  0.0-10.1 sec  1.07 GBytes   915 Mbits/sec

OK, that's working.

Blog Archive

About Me

My photo

Built art installations, web sites, graphics libraries, web browsers, mobile apps, desktop apps, media player themes, many nutty prototypes