fhtr: July 2018

2018-07-28

Crank, the SUBSET-SUM edition

SUBSET-SUM: 2^n possible answers for n-element input. Input composed of n k-bit numbers, find a subset that sums up to a target number. Total input length is n*k bits. If n is log(k), solvable in polynomial time to n*k (2^n ~= k ~= n*k). If k is log2(n), also solvable in polynomial time (2^k unique numbers ~= n => you've got all the numbers, greedy solution always works). Therefore k = p(n).

If you pre-sort the numbers, you get a partially sorted binary enumeration. E.g. If you have 4 numbers in the input, you can state your answers as binary numbers from 0001 to 1111. Because the numbers are sorted, you know that 0100 > 0010, regardless of the numbers involved.

In addition to ordered bit patterns, the output space is composed on ambiguous bit patterns. Think of 0110 and 1001. You can't tell which one is larger without evaluating the sum in some way.

The kickers are that for any given bit pattern, you can generate an ordered pattern that is smaller or larger to it. And the number of ambiguous bit patterns between the two ordered patterns grows exponentially to n. Indeed, the number of ambiguous bit patterns for a given bit pattern grows exponentially with n as well.

Think of this 24-bit pattern: 000000111111111111000000. It's got the middle 12 bits set, surrounded by 6 zero bits on either side. Because of the ordered pattern properties (if you move a bit up, the sum is larger, if you move a bit down, the sum is smaller), you know that any bit pattern with 12 bits set where all the bits are below the 6th bit are equal or smaller to the mid-bits-set pattern. Likewise, any bit pattern with zeroes above the 18th bit are equal to or larger to the mid-bits-set pattern. The bit patterns ambiguous to the mid-bits-set pattern are ones where you have bits moved to above and below the pattern. This would constitute around 2¹² patterns, or 2^n/2.

You might think that this is pretty nice to use for solution search, but note that the ordered pattern comparison only decreases the size of the output space under consideration by 2^-n/4. With 24 bits, this comparison would reduce your output space to 63/64 of the original. With 28 bits, 127/128. With 30 bits, 255/256. And so on.

Even if you manage to use some fancy combination of comparisons and pattern recognition to rule out that portion of the search space with each comparison, you're still running at 2^n/4 complexity.

Digression: The SUBSET-SUM search graph is composed of n layers. If you can solve one layer in p(n) time, you can solve the problem in p(n) time (because n*p(n) ~ p(n)). And of course, if you can't solve a layer in p(n), you can't solve the full problem either. This doesn't really help much, since the number of sums in the middle layer grows exponentially with n, and the number of unique sums grows exponentially with k. (If you think of the middle layer as a line with a little tick mark for every number generated on it, adding one to n up-to-doubles the number of tick marks on the line. Adding one to k doubles the length of the line. Adding one to n increases the size of the input by k bits. Adding one to k increases the size of the input by n bits.)

Digression 2: You can easily generate a number that's got the maximum distance from the target number equal to max(input), just sum them up until you reach a number that's higher than the target. Of course, max(input) is around 2^k. I believe you could also generate a number that has a distance around the difference between two numbers in the input (greedily swap numbers to move the generated number towards the target). The difference between two adjacent input numbers is around 2^k/n.

Digression 3: Even if you could get to p(n) distance from the target, you'd have to then iterate towards the target. You can generate an ordered pattern in the right direction, but that'll jump you an exponential distance. And after you've jumped to the closest ordered pattern, you need to find the next largest number for it. And finding the next largest number for a given bit pattern is tough. You need to resolve the ambiguous patterns for it. Which are exponential in count. And where every evaluation of a pattern yields you a portion of the solution that shrinks exponentially with n. If so, finding the next largest number to a pattern sounds like it's harder than NP (how would you verify the adjacency in polynomial time?)

Digression 4: Let me propose a series of simplified problems for SUBSET-SUM. First, let's limit ourselves to problems where a solution exists, and where exactly half of the numbers are used for the generated number. The first simplified problem is finding an adjacent number for such a generated number. The second is finding a number that's p(n) distant from a generated number. The third is finding the actual generated number. I don't think any of these can be solved in p(n)...

2018-07-05

Even more crank on P != NP

Size limits for the input and the output

If the input is exponentially smaller than the output, the problem is not in NP. If the input is power-of-two exponentially larger than the output, the problem is in P.

In the first case, the output is exponentially longer than the input and therefore takes exponentially longer to print out, even on a non-deterministic machine. Even if you could print it out instantly, you need to check the output with a deterministic verifier that needs to run in polynomial time to the original input, and reading the exponentially larger output into the verifier would take exponential time to the original input. So exponentially longer outputs are not in NP.

In the second case, if you have a polynomial time deterministic verification algorithm, you can create a solver by adding an output space enumerator in front of it. If the output is exponentially smaller than the input, hey, you've got a polynomial time solver. If |output| = log2(|input|), enumerating the outputs takes 2^|output| = 2^{log2(|input|)} = |input| steps. From NP, you have the verifier that runs in p(|input|) so a solver that enumerates the output space and tries the verifier on each possible solution would run in 2^|output| * p(|input|), which becomes |input| * p(|input|) if the output is exponentially smaller than the input like above. (So any problems in NP where the output is a single bit are also in P.)

Due to these size limits, for P ?= NP the size of the output must be polynomial(-ish??) to the size of the input.

Size limits for the solver

How about the size of the deterministic solver relative to the input. If the deterministic solver machine is by necessity exponentially larger than the input (by necessity meaning that for any given part of the machine there exists an input that makes the machine access that part), there also exists an input that causes the machine to move the tape a distance exponential to the size of the input, which takes exponential time relative to the input. So the deterministic solver needs to have a size at most polynomial to the size of the input.

For the lower bound, consider the case where the machine extracts zero information from the input. This machine just enumerates over the output space, checking each answer using the verifier. The runtime of the machine is 2^|output| * p(|input|), the code is fixed in size, and it uses |output| extra tape as memory. At the other end is the answer table machine: it's an array of precomputed solutions where the key is the input. The size of the answer table machine is |output| * 2^|input| (and due to the above tape movement limitations, will take exponential time to run relative to some input.)

Knowing nothing about the input is out, and knowing everything about the input is out. Every bit in the program encodes information about the input-output-relation, allowing the solution search to proceed faster than brute-force enumeration when the program encounters an input that matches the encoded information. The program can also learn from the input, but it can't learn an exponential amount of information from the input (because that'd take exponential time.) Wild guess: If the amount of information used by the program is exponentially less than the amount of information in the input-output-relation, the program would end up using brute-force search over parts of the output that are polynomial in size to the input, which would take exponential time to the input. If true, there would be no sub-polynomial-space programs for P ?= NP that run in polynomial time.

Given the existence of the deterministic polynomial time verifier, we know that you can extract information from the input-output-pair in polynomial time, even though the amount of information extracted is very small. How small? Well, at least you get |output| / 2^|output| bits of info. After trying every 2^|output| outputs, you would have |output| bits and a correct output. If there are several correct outputs, you'd gain numberOfOutputs * |output| / 2^|output| bits. If the numberOfOutputs grows exponentially with the output size, this would give you a polynomial time solver. But, if this were the case, someone would've noticed it. Another wild guess: the amount of information we get out of the verifier shrinks exponentially in relation to the output size (and therefore the input size, which has a polynomial size relation to the output).

The number of input enumerations is 2^|input|, ditto for the output 2^p(|input|). We have a deterministic polynomial time verifier that can extract solution information from the input, but the amount of information extracted looks to be a logarithm of the input size. We also have a hypothetical solver program that encodes a polynomially growing amount of information about the input-output-relation, allowing it to avoid brute-force search whenever it encounters an input that matches the encoded information.

If the number of patterns that can be encoded in the input grows exponentially with the input length, then as the problem size grows towards infinity the question becomes: can you discern between an infinite number of input patterns with a zero-length solver program and a verifier that can extract zero bits of information from the input? For P to be NP, the number of patterns in the input needs to grow as a polynomial to the length of the input.

2018-07-02

Some more rubbish on P != NP

Feeling crank today, so:

Non-deterministic machines are one-bit oracles. Given a branch in execution, they always follow the correct path. You can use this property to write a universal non-deterministic machine:

readInput();

NonDeterministic done, correctOutput;

while (!done) {
  outputBit(correctOutput);
}

Its execution time is linear to the size of the output. There are still problems that take non-polynomial time compared to the input. For instance, "given an input of n bits, print out n! bits".

The question in P ?= NP then is: "given a problem where the output size is a polynomial of the input size, is it possible to emulate a non-deterministic machine using a deterministic machine with at most a polynomial slowdown". Since the ND machine takes a number of steps linear to the output size, which is polynomial of the input size, it solves the problem in polynomial time to the input. If the deterministic machine can emulate the ND machine at a polynomial slowdown, its running time is a polynomial of the output size, which is a polynomial of a polynomial of the input size.

Maybe you could also ask if it's possible to prove that there exists a problem for which an optimal program running on a deterministic machine requires a number of steps exponential to the size of its output.

Make a bridge from output size to input size? ...exponential to the size of its output and where the output size grows linearly with input size.

fhtr