We consider three problems with a common starting point. You are given a set \(S\) of \(10,000\) distinct positive integers, each at most \(100,000\text{,}\) and then asked the following questions.

Is \(83,172\) one of the integers in the set \(S\text{?}\)

Are there three integers in \(S\) whose sum is \(143,297\text{?}\)

Can the set \(S\) be partitioned as \(S=A\cup B\) with \(A\cap B=\emptyset\text{,}\) so that \(\sum_{a\in A}a=\sum_{b\in B}b\text{.}\)

The first of the three problems sounds easy, and it is. You just consider the numbers in the set one by one and test to see if any of them is \(83,172\text{.}\) You can stop if you ever find this number and report that the answer is yes. If you return a no answer, then you will have to have read every number in the list. Either way, you halt with a correct answer to the question having done at most \(10,000\) tests, and even the most modest netbook can do this in a heartbeat. And if the list is expanded to \(1,000,000\) integers, all at most a billion, you can still do it easily. More generally, if you're given a set \(S\) of \(n\) numbers and an integer \(x\) with the question “Is \(x\) a member of \(S\text{?}\)”, you can answer this question in \(n\) steps, with each step an operation of testing a number in \(S\) to see if it is exactly equal to \(n\text{.}\) So the running time of this algorithm is proportional to \(n\text{,}\) with the constant depending on the amount of time it takes a computer to perform the basic operation of asking whether a particular integer is equal to the target value.

The second of the three problems is a bit more challenging. Now it seems that we must consider the \(3\)-element subsets of a set of size \(10,000\text{.}\) There are \(C(10,000,3)\) such sets. On the one hand, testing three numbers to see if their sum is \(143,297\) is very easy, but there are lots and lots of sets to test. Note that \(C(10,000,3)=166,616,670,000\text{,}\) and not too many computers will handle this many operations. Moreover, if the list is expanded to a million numbers, then we have more than \(10^{17}\) triples to test, and that's off the table with today's hardware.

Nevertheless, we can consider the general case. We are given a set \(S\) of \(n\) integers and a number \(x\text{.}\) Then we are asked whether there are three integers in \(S\) whose sum is \(x\text{.}\) The algorithm we have described would have running time proportional to \(n^3\text{,}\) where the constant of proportionality depends on the time it takes to test a triple of numbers to see if there sum is \(x\text{.}\) Of course, this depends in turn on just how large the integer \(x\) and the integers in \(S\) can be.

The third of the three problems is different. First, it seems to be much harder. There are \(2^{n-1}\) complementary pairs of subsets of a set of size \(n\text{,}\) and one of these involves the empty set and the entire set. But that leaves \(2^{n-1}-1\) pairs to test. Each of these tests is not all that tough. A netbook can easily report whether a two subsets have the same sum, even when the two sets form a partition of a set of size \(10,000\text{,}\) but there are approximately \(10^{3000}\) partitions to test and no piece of hardware on the planet will touch that assignment. And if we go up to a set of size \(1,000,000\text{,}\) then the combined computing power of all the machines on earth won't get the job done.

In this setting, we have an algorithm, namely testing all partitions, but it is totally unworkable for \(n\) element sets when \(n\) is large since it has running time proportional to \(2^n\text{.}\)

#
Subsection4.2.2Certificates

Each of the three problems we have posed is in the form of a “yes/no” question. A “yes” answer to any of the three can be justified by providing a certificate that can be checked efficiently. For example, if you answer the first question with a yes, then you might provide the additional information that you will find \(83,172\) as the integer on line \(584\) in the input file. Of course, you could also provide the source code for the computer program, and let a referee run the entire procedure.

Similarly, if you answer the second question with a yes, then you could specify the three numbers and specify where in the input file they are located. An impartial referee could then verify, if it mattered, that the sum of the three integers was really \(143,297\) and that they were located at the specified places in the input file. Alternatively, you could again provide the source code which would require the referee to test all triples and verify that there is one that works.

Likewise, a yes for the third question admits a modest size certificate. You need only specify the elements of the subset \(A\text{.}\) The referee, who is equipped with a computer, can (a) check to see that all numbers in \(A\) belong to \(S\text{;}\) (b) form a list of the subset \(B\) consisting of those integers in \(S\) that do not belong to \(A\text{;}\) and (c) compute the sums of the integers in \(A\) and the integers in \(B\) and verify that the two sums are equal. But in this case, you would not provide source code for the algorithm, as there does not appear (at least nothing in our discussion thus far provides one) to be a reasonable strategy for deciding this problem when the problem size is large.

Now let's consider the situation with a “no” answer. When the answer to the first question is no, the certificate can again be a computer program that will enable the referee to consider all the elements of \(S\) and be satisfied that the number in question is not present. A similar remark holds for the second question, i.e., the program is the certificate.

But the situation with the third question is again very different. Now we can't say to the referee “We checked all the possibilities and none of them worked.” This could not possibly be a true statement. And we have no computer program that can be run by us or by the referee. The best we could say is that we tried to find a suitable partition and were unable to do so. As a result, we don't know what the correct answer to the question actually is.

#
Subsection4.2.4Input Size

Problems come in various sizes. The three problems we have discussed in this chapter have the same input size. Roughly speaking this size is \(10,000\) blocks, with each block able to hold an integer of size at most \(100,000\text{.}\) In this text, we will say that the input size of this problem is \(n=10,000\text{,}\) and in some sense ignoring the question of the size of the integers in the set. There are obvious limitations to this approach. We could be given a set \(S\) of size \(1\) and a candidate element \(x\) and be asked whether \(x\) belongs to \(S\text{.}\) Now suppose that \(x\) is a bit string the size of a typical compact disk, i.e., some \(700\) megabytes in length. Just reading the single entry in \(S\) to see if it's exactly \(x\) will take some time.

In a similar vein, consider the problem of determining whether a file \(x\) is located anywhere in the directory structure under \(y\) in a unix file system. If you go on the basis of name only, then this may be relatively easy. But what if you want to be sure that an exact copy of \(x\) is present? Now it is much more challenging.