art with code

2018-07-28

Crank, the SUBSET-SUM edition

SUBSET-SUM: 2^n possible answers for n-element input. Input composed of n k-bit numbers, find a subset that sums up to a target number. Total input length is n*k bits. If n is log(k), solvable in polynomial time to n*k (2^n ~= k ~= n*k). If k is log2(n), also solvable in polynomial time (2^k unique numbers ~= n => you've got all the numbers, greedy solution always works). Therefore k = p(n).

If you pre-sort the numbers, you get a partially sorted binary enumeration. E.g. If you have 4 numbers in the input, you can state your answers as binary numbers from 0001 to 1111. Because the numbers are sorted, you know that 0100 > 0010, regardless of the numbers involved.

In addition to ordered bit patterns, the output space is composed on ambiguous bit patterns. Think of 0110 and 1001. You can't tell which one is larger without evaluating the sum in some way.

The kickers are that for any given bit pattern, you can generate an ordered pattern that is smaller or larger to it. And the number of ambiguous bit patterns between the two ordered patterns grows exponentially to n. Indeed, the number of ambiguous bit patterns for a given bit pattern grows exponentially with n as well.

Think of this 24-bit pattern: 000000111111111111000000. It's got the middle 12 bits set, surrounded by 6 zero bits on either side. Because of the ordered pattern properties (if you move a bit up, the sum is larger, if you move a bit down, the sum is smaller), you know that any bit pattern with 12 bits set where all the bits are below the 6th bit are equal or smaller to the mid-bits-set pattern. Likewise, any bit pattern with zeroes above the 18th bit are equal to or larger to the mid-bits-set pattern. The bit patterns ambiguous to the mid-bits-set pattern are ones where you have bits moved to above and below the pattern. This would constitute around 212 patterns, or 2n/2.

You might think that this is pretty nice to use for solution search, but note that the ordered pattern comparison only decreases the size of the output space under consideration by 2-n/4. With 24 bits, this comparison would reduce your output space to 63/64 of the original. With 28 bits, 127/128. With 30 bits, 255/256. And so on.

Even if you manage to use some fancy combination of comparisons and pattern recognition to rule out that portion of the search space with each comparison, you're still running at 2n/4 complexity.

Digression: The SUBSET-SUM search graph is composed of n layers. If you can solve one layer in p(n) time, you can solve the problem in p(n) time (because n*p(n) ~ p(n)). And of course, if you can't solve a layer in p(n), you can't solve the full problem either. This doesn't really help much, since the number of sums in the middle layer grows exponentially with n, and the number of unique sums grows exponentially with k. (If you think of the middle layer as a line with a little tick mark for every number generated on it, adding one to n up-to-doubles the number of tick marks on the line. Adding one to k doubles the length of the line. Adding one to n increases the size of the input by k bits. Adding one to k increases the size of the input by n bits.)

Digression 2: You can easily generate a number that's got the maximum distance from the target number equal to max(input), just sum them up until you reach a number that's higher than the target. Of course, max(input) is around 2k. I believe you could also generate a number that has a distance around the difference between two numbers in the input (greedily swap numbers to move the generated number towards the target). The difference between two adjacent input numbers is around 2k/n.

Digression 3: Even if you could get to p(n) distance from the target, you'd have to then iterate towards the target. You can generate an ordered pattern in the right direction, but that'll jump you an exponential distance. And after you've jumped to the closest ordered pattern, you need to find the next largest number for it. And finding the next largest number for a given bit pattern is tough. You need to resolve the ambiguous patterns for it. Which are exponential in count. And where every evaluation of a pattern yields you a portion of the solution that shrinks exponentially with n. If so, finding the next largest number to a pattern sounds like it's harder than NP (how would you verify the adjacency in polynomial time?)

Digression 4: Let me propose a series of simplified problems for SUBSET-SUM. First, let's limit ourselves to problems where a solution exists, and where exactly half of the numbers are used for the generated number. The first simplified problem is finding an adjacent number for such a generated number. The second is finding a number that's p(n) distant from a generated number. The third is finding the actual generated number. I don't think any of these can be solved in p(n)...

No comments:

Blog Archive