Module V
COMPLEXITY THEORY
Motivation:
Computability : Generally problems are categorize into two, solvable and unsolvable.
Unsolvable means no matter how clever you are, no matter how powerful your digital
computer is, it is impossible to get the solution. you can understand the problem clearly, but
difficult to solve it. Some example of unsolvable problem are Halting problem, post
correspondence problem etc. Mathematicians initially raised this question for ages, what can
be computed and what cannot be computed. For example it is possible to bisect an angle
using ruler and compass, but it is impossible to trisect an angle. It is not possible to write
in decimal form, we can write upto 10000 place or 1 lakh place. No one is able to write in
exact decimal form or no one will ever be able to write in decimal form . They have come up
with a beautiful categorization of computability. Using recursive function they explained
what can be computed and what can not. They said recursive function is the ultimate for
computation. Then logicians, they defined first order logic ,second order logic etc and argued
that this is the ultimate for computation. Then linguistics have come with some symbols and
they said if any thing is computable it can be computed using some string manipulation. They
said type 0 grammar is the ultimate for computation. Turing, Even before the computer is
found, he designed a machine for computation. He said corresponding to any computable
function, there is a instruction set in the machine which is able to compute it. All these
computational models happened independently and everyone argued that their model is the
ultimate for computation.
Even though all the theory was developed independently ,Church claimed that all are
equivalent and his is just like many religion identified god in different way and finally agreed
that all are true
In mid fifties it is agreed that Turing computability is the ultimate .if recursive function
capture some functions which is computable, correspondingly we can design a Turing
machine. Therefore a problem is solvable then it is possible to design a Turing machine. If it is
impossible to design a Turing machine for a particular problem that problem is unsolvable
Complexity Theory
Now things changed from what can be computed to how well it can be computed
It is possible to find an efficient solution for certain problems and in certain other problem it is
not. For a travelling salesman problem it is not possible to get an n 2 algorithm .In automata
theory Idea of non determinism is introduced. here we solve the problem by doing multiple
option concurrently
Deterministic and non deterministic algorithms
The algorithm with the property that the result of every operation is uniquely determined is
known as deterministic algorithm.
Ex Algorithm to find root of a quadratic equation, algorithm to sort n numbers using bubble
sort etc
Algorithms whose outcomes are not uniquely defined but are limited to specified set of
possibilities is known as nondeterministic algorithm
Ex consider the following pseudo code
X = 10;
If(x==5)
{
Call function 1
Or
Call function 2
{
Else {
Call function 3
Or
Call function 4
}. This pseudo code is nondeterministic in nature.
Tractable and intractable problems
The problems that can be solved in finite amount of time is called tractable problems
Ex problem to find sum of 100 numbers
Problems solved using exponential algorithms(having the worst case complexity such as
O(2n),O(nn)) are called intractable problems
EX Travelling salesman problem
Decision problems
Any problem for which the answer is either zero or one is called a decision problem. An
algorithm for a decision problem is termed as a decision algorithm.
P class problem
It is the set of all decision problems that can be solved using deterministic algorithms in
polynomial time.
Feature of P class problem
1) P class problems are decision type problems.
2) P class problems are tractable
3) They are solved using deterministic algorithms
Non deterministic Polynomial(NP) problems
NP problems can be defined as the class of decision problems that can be solved in
polynomial time by non deterministic algorithms. Guess and verify mechanism is used to
prove that a problem is in NP. That is if a certificate is given it can be verified in polynomial
2
time. For example in the Hamiltonian cycle problem, the certificate is the list of vertices in the
Hamiltonian cycle. If a graph is Hamiltonian, the Hamiltonian cycle itself offers enough
information to verify this fact. NP is the set of problems where we can verify a Yes answer
quickly if we have the solution in front of us. For example, the circuit satisfiability problem is
in NP. If the answer is yes, then any set of m input values that produces True output is a proof
of this fact; we can check the proof by evaluating the circuit in polynomial time.
Note : Hamiltonian cycle is a simple cycle that visits every vertex exactly once
Prove that searching problem belongs to NP
Consider the searching problem. problem is to search a particular element in a binary tree. Let
the depth of the binary tree be n. Then the number of elements in the tree having depth n is
2n-1. If we use deterministic algorithm the search would take O(2 n) comparison, which is
exponential one.
But here we apply non deterministic machine to solve this problem as follows.
A non deterministic machine can compare the search element with all elements at each level
in single operation (one unit of time) .That is at each level it can have multiple options and it
will be performed in single unit of time. Therefore the search algorithm will have time
complexity O(n) which is polynomial in nature. Hence above problem is NP.
Note:
Since deterministic algorithms are just a special case of non deterministic one we conclude
that P NP. But it is not known whether P = NP or P NP which is a famous unsolved problem
in computer science.
Polynomial reductions
Idea: transform the inputs of A to inputs of B
Defn: Let L1 and L2 be two problems. Problem L1 reduces to L2 iff there is a way to solve L1
by a deterministic polynomial algorithm that solves L2 in polynomial time. This means that if
we have a polynomial algorithm for L2 then we can solve L1 in polynomial time. Or in other
words we write L1 p L2 which means L1 is not more than a polynomial factor harder than L2.
Interpretations of L1 p L2:
3
1) L1 is at most as hard as L2: It is possible to solve L1, given an algorithm for L2
2) L2 is at least as hard as L1: if L1 is known to be hard, then the reduction L1 p L2
yields that L2 is also hard .
Assume that L1 is reducible to L2
Then if:
We know:
L1 is
unsolvable
L2 is unsolvable
L1 is solvable
Nothing about
L2
L2 is solvable
L1 is solvable
L2 is
unsolvable
Nothing about
L1
Explain polynomial time reduction technique to prove NP completeness
Consider a decision problem A which would like to solve in polynomial time. We call the input
to a particular problem an instance of that problem. Suppose that there is another decision
problem say B that we already know how to solve in polynomial time. We transform an
instance of A into some instance of B with the following characteristics
1) The transformation takes polynomial time
2) The answer for is yes iff the answer for is also yes .
Here we use the easiness of B to prove the easiness of A.That is if the problem B can
be solved in polynomial time then A is also can be solved in polynomial time.
Once we have some NP complete problems we can prove a new
problem to be NP complete by reducing some known NP complete to it
using a polynomial time reduction.
How could we use polynomial time reductions to show that no polynomial algorithm
exist for a problem ?
We need to show that there is no polynomial algorithm exist for the problem B.
We assume that there exists a polynomial algorithm for the problem B.
Suppose we have a decision problem A for which we already know that no polynomial
algorithm exist. We transform instance of A to instance of B using polynomial time reduction.
Since B is solvable ,A will also be solvable using polynomial reduction algorithm., which is a
contradiction to our assumption. hence no polynomial time algorithm exists for B.
NP hard problems
A problem L is NP hard if there is a L1
NP which can reduce to L in polynomial time. A
problem L is NP-hard if a polynomial-time algorithm for L would imply a polynomial-time
4
algorithm for every problem in NP. This is like saying that if we could solve one particular NPhard problem quickly, then we could quickly solve any problem whose solution is easy to
understand, using the solution to that one special problem as a subroutine.
Example: satisfiabilty problem, traveling salesman problem
NP Complete
The concept of NP completeness was introduced by S.A Cook to solve the problem P = NP.
A problem L is said to be NP complete if
1) L
NP
2) For every L 1
NP there exist polynomial time reduction of L1 to L ( L is NP hard)
Properties of NP complete problems
1) No polynomial time algorithm has been found for any one of them
2) It is not established that polynomial algorithm for these problems do not exist
3) If a polynomial algorithm is found for one of them,there will be polynomial algorithm for
all of them.
4) If it can be proved that no polynomial time algorithm exists for any one of them then it
will not exist for any one of them
NP Complete Problems
The following lemma is the basis of the for showing that a problem is NP complete
Lemma: If L is a problem such that L p L for some L NPC then L is NP hard. Moreover If L
NP then L NPC.
Pf : Since L is NP complete, for all L NP , we have L p L. By supposition . L p L and thus
by transitivity we have L p L , which sows that L is NP hard . If L NP , we also have L
NPC.
Theorem 1
Circuit satisfiabilty problem is NP complete.
Circuit satisfiability is a good example of a problem that we don't know how to solve in
polynomial time. In this problem, the input is a boolean circuit: a collection of and, or, and not
gates connected by wires. The input to the circuit is a set of m boolean (true/false) values x1,
xm. The output is a single boolean value. The circuit satisfiability problem asks, given a
circuit, whether there is an input that makes the circuit output True, or conversely, whether
the circuit always outputs False. Nobody knows how to solve this problem faster than just
5
trying all 2m possible inputs to the circuit, but this requires exponential time. On the other
hand, nobody has ever proved that this is the best we can do; maybe there's a clever
algorithm that nobody has discovered yet! But using non deterministic algorithm we can solve
this problem in polynomial time.(Proof not necessary)
Satisfiabilty problem(SAT)
SATISFIABILITY, or SAT is a problem of great practical importance, with applications ranging
from chip testing and computer design to image analysis and software engineering. It is also a
canonical hard problem. Here's what an instance of SAT looks like:
(x y z) (x y) (y z) (z x) (x y z):
This is a Boolean formula in conjunctive normal form (CNF). It is a collection of clauses (the
parentheses), each consisting of the disjunction of several literals, where a literal is either a
Boolean variable (such as x) or the negation of one (such as x).A satisfying truth assignment
is an assignment of false or true to each variable so that every clause contains a literal whose
value is true.
Let x1,x2, xn be n variables which assumes 2 values 0 or 1. Let f(x1,x2,xn) be a function in
Boolean variable and these variables are connected by connectives AND,OR,NOT,
IMPLICATION and IFF . The problem is to find an assignment for the variable x1,x2,xn such
that f(x1,x2..xn) = 1.The satisfiability problem asks whether a given Boolean formula is
satisfiable.
As an example the formula =(( x1x2) ((x1x3)x4))x2 has the satisfying
assignment x1 = 0, x2 =0, x3=0, x4 =1), since = (( 00) ((01) 1))0 =
(1 (1 1))1 = (10)1 = 1 and hence the formula belongs to SAT.
Qn Prove that satisfiabilty problem is NP complete
Proof : In order to prove SAT in NP complete we need to show 1) SAT
To prove SAT
NP 2) SAT is NP HARD
NP
X1,x2..xn are n variable which takes values either 0 or 1.We need to find the assignment of
the variable x1,x2xn with values 0 or 1 so that f(x1,x2,..xn) = 1.In order to find truth
assignment for the variable we have to see the all the possibilities. Since there are 2 n different
possibilities, the time complexity is O(2n) if we apply deterministic algorithm.
So we apply non deterministic algorithm to solve this problem. We arrange each assignment
in a tree as follows.
Suppose n=3, then different assignments are (0,0,0), (0,0,1), (1,0,0), (1,1,0), (1,0,1), (0,1,0),
(1,0,1), (0,1,1).
In nondeterministic algorithm we take assignment at each level as a single unit of operation
instead of taking each assignment as a single unit. Here we have only 3 levels to do operation
and hence the time complexity is only O(n) which is a polynomial time. Hence SAT
NP
To prove SAT
Here we prove that CIRCUIT SAT P SAT. In other words any instance of circuit satisfiabilty can
be reduced in polynomial time to an instance of formula satisfiability. For example, we could
transform the given circuit into a formula as follows:
Y8 = y4 y7 y6
=(y1 x2) (y5 y3) x5
=((x1 x4 ) x2) (x2 (y2x3)) x5
= ( x1 x2) (x4 x2) (x2 y2) (x2 x3) x5 which is in CNF.
Now the original circuit is satisfiable if and only if the resulting formula is satisfiable. For each
wire xi in the circuit ,the formula has a variable xi. The proper operation of a gate can now
be expressed a formula involving the variable of its incident wires. If the circuit C has a
satisfying assignment ,each wire of the circuit has a well defined value and output of the
circuit is 1.Therefore the assignment of wire values to variables in makes each clause of
evaluate to 1 and thus conjunction of all evaluates to 1. Conversely if there is an assignment
that causes to evaluate 1,the circuit is satisfiable by an analogous argument. Thus we have
CIRCUIT SAT P SAT.
We have already proved by theorem 1 that CIRCUIT SAT IS NP COMPLETE. Since CIRCUIT SAT
is NP Complete,SAT NP and CIRCUIT SAT P SAT, SAT is also NP Complete
Prove that 3 CNF satisfiabilty( 3 SAT) is NP complete
A literal in a Boolean formula is an occurrence of a variable or its negation. A Boolean formula
is in conjunctive form or CNF if it is expressed as an AND of clauses,each of which is the OR of
one or more literals. A Boolean formula is in 3-conjunctive form or 3 CNF if each clause has
exactly 3 distinct literals. For example the Boolean formula ( x1 x1 x2) ( x3 x2
x4) (x1 x3 x4 ) is in 3 CNF. In 3 CNF-SAT, we are asked whether a given Boolean
formula is satisfiable
To prove 3 CNF NP
The same proof for the SAT NP can be given . Here
literals.
each clause has exactly 3 distinct
To prove 3-CNF NP hard ,we show that SAT P 3-CNF-SAT . The reduction algorithm can be
broken into 3 basic steps
Step 1: first we construct a binary parse tree for the input formula with literals as leaves
and connectives as internal nodes. The parse tree for the formula =(( x1x2)
((x1x3)x4))x2 is
We introduce a variable yi for the output of each internal node. Then we rewrite the
original formula as the AND of the root variable and a conjunction of clause describing the
operation of each node. The resulting expression corresponding to the formula is
= y1 (y1(y2 x2)
(y2 (y3 y4) (y3 (x1 x2) (y4y5)
(y5 (y6 x4)
(y6 (x1 x3)) . Observe that the formula thus obtained is a
conjunction of clauses i, each of which has at most 3 literals. The only additional
requirement is that each clause be an OR of literals
8
Step 2 : we convert each clause I into conjunctive normal form. We construct a truth table
for i by evaluating all possible assignments to its variable. Each row of the truth table
consists of a possible assignment of the variables of the clause ,together with the value of
the clause under that assignment. Using the truth variable entries that evaluate to 0, we build
a formula in disjunctive form, which is equivalent to
i. We then convert this formula into
CNF using DeMorgans law. For example we take 1 = (y1(y2 x2) and the truth table
corresponding to this is
y1
y2
x2
(y1(y2 x2)
0
0
0
1
Therefore the DNF is
1 = ( y1 y2 x2) ( y1 y2 x2) (y1 y2 x2)
( y1 y2 x2) we get CNF by applying DeMorgans law . i.e
1 = ( y1 y2
x2) ( y1 y2 x2) (y1 y2 x2) (y1 y2 x2) . Similarly each clause i of
the formula can be converted to i, which is a CNF with almost 3 literals
Step 3 : Each clause has exactly 3 distinct literals
For each clause Ci of , we include following clauses in .
1) If Ci has 3 distinct literals we simply include Ci as a clause of .
2) If Ci has 2 distinct literals, that is if Ci = (l1 l2) then we include (l1 l2 p) (l1 l2
p) as clause of .
3) If Ci has just 1 distinct literal l then we include (l p q) (l p q) (l p
q) (l p q) as clause of ,
We can see that 3-CNF formula is satisfiable iff is satisfiable. The construction of
from in the first step preserves satisfiability. The second step produces a CNF formula
that is equivalent to . The third step produces a 3 CNF formula that is equivalent to .
Hence SAT P 3-CNF-SAT.We have already proved that SAT IS NP COMPLETE. Since 3- CNF
NP and SAT P 3- CNF SAT, 3 -CNF is also NP Complete.
Clique problem is NP Complete
A clique is another name for a complete graph(every pair of verices is connected by an
edge). The maximum clique size problem, or simply MaxClique, is to compute, given a graph,
the number of nodes in its largest complete subgraph.
9
First we prove that Clique problem NP,
..(1)
For a given graph G = (V,E) we use the set V V of vertices in the clique as a certificate for
G.Checking whether V is a clique can be accomplished in polynomial time by checking
whether for each pair u,v V the edge(u,v) belongs to E.
Secondly we prove that it is NP Hard using a reduction from 3CNF -SAT(2)
For that we ues a reduction algorithm that transforms a 3CNF formula into a graph that has a
clique of a certain size if and only if the formula is satisfiable.
The graph has one node for each instance of each literal in the formula. Two nodes are
connected
by an edge if (1) they correspond to literals in different clauses and (2) those literals do not
contradict each other. In particular, all the nodes that come from the same literal (in different
clauses) are joined by edges. For example, the formula
is transformed into the following graph. (Look for the edges that aren't in the graph.)
Now suppose the original formula had k clauses. Then we claim that the formula is satisfiable
if and only if the graph has a clique of size k.
1. k-clique =) satisfying assignment: If the graph has a clique of k vertices, then each vertex
must come from a different clause. To get the satisfying assignment, we declare that each
literal in the clique is true. Since we only connect non-contradictory literals with edges,this
10
declaration assigns a consistent value to several of the variables. There may be variables that
have no literal in the clique; we can set these to any value we like.
2. satisfying assignment =) k-clique: If we have a satisfying assignment, then we can choose
one literal in each clause that is true. Those literals form a clique in the graph.Thus, the
reduction is correct. Since the reduction from 3CNF formula to graph can be done in
polynomial time, so MaxClique is NP-hard.
From (1) and (2) clique problem is NP complete
Hamiltonian cycle problem is NP Complete ;
refer Algorithms by Thomas coreman
Travelling salesman problem(TSP) is NP Hard
Problem : A salesman wishes to make a tour visiting each city exactly once and finishing the
city he starts from. There is an integer cost c(i,j) to travel from city i to city j and salesman
wishes to make the tour whose total cost is minimum .
To prove TSP is NP hard
Here we show that HAM-CYCLE p TSP. Let G = (V,E) be an instance of HAM CYCLE. We
construct an instance of TSP as follows. We form the complete graph G =(V,E) where E ={ (i
,j)/i,j V and i j } and we define the cost function as
. The instance of TSP is then (G,c, 0) which is easily formed in
polynomial time. We now show that the graph G has a Hamiltonian cycle iff graph G
has a tour of cost at most 0.Suppose that graph G has a Hamiltonian cycle h. Each edge
in h belongs to E and thus has cost 0 in G. Thus h is a tour in G with cost 0.
h of cost at most 0.Since the cost of the edges in E
are 0 and 1,the cost of tour h is exactly 0 and each edge on the tour must have cost
0.Therefore h contains only edges in E. We conclude that h is a Hamiltonian cycle in
graph G.
Prove that halting problem on turing machine is undecidable.
The halting problem is to determine for an arbitrary deterministic algorithm A and an input I
whether algorithm A with input I ever terminates or enters an infinite loop.
It is well known that this problem is undecidable. We can show that satisfiabilty reduces to
halting problem by constructing an algorithm whose input is a propositional formula X. If X has
n variables then A tries out all 2 n possible assignments and verifies whether X is satisfiable. if
it is then A stops, otherwise it enters in an infinite loop. Hence A halts on input X iff X is
satisfiable.
If we had a polynomial time algorithm for the halting problem then we could solve the
satisfiability problem in polynomial time using A and X as input to the algorithm for the halting
problem.
But we already proved that satisfiability problem cannot be solved in polynomial time using
deterministic algorithm. Hence no polynomial algorithm for halting problem. Therefore halting
problem is an NP Hard problem that is not in NP
11
CIRCUITSAT
SAT
3 CNF
SAT
CLIQU
E
SUBSET SUM
VERTEX
COVER
HAM
CYCLE
TSP
Np COMPLETE PROBLEMS
Explain church Thesis
It says that No computational procedure will be considered an algorithm unless it can be
represented as a Turing machine. In other words we can say any thing which is computable
can be designed using T M. This was given by Alonzo Church in 1936. According to him T M is
equivalent to all computing model .This highlights that T M is the Ultimate for computation
POST CORRESPONDANCE PROBLEM (PCP).
The PCP was first formulated by Emit Post in 1940. Given two sequence of n strings on some
alphabets . Say X = { x1,x2,xn } and Y = {y1 , y2, yn }. Now the problem is ; Does
there exist a non empty sequence of integers i1,i2.ik such that k 1 and
xi1xi2xik =
yi1yi2.yik ?
Example : Let
= {a,b,c} and P = (X,Y) where X = {b,c,ab,bca} and Y = (bc, ab, b, a}.
Does PCP P have a solution?
Let x1 = b , x2 = c x3 = ab x4 = bca
and y1 =bc y2 = ab
sequence (1,2,3,1,4) the PCP have the solution as shown below.
b
X1
bc
Y1
ab
x2
ab
y2
x3
b
bca
x1
bc
y3
y1
x4
a
y4
12
y3 =b y4 = a. For the
The problem given an arbitrary instance of PCP, whether that instance has
a solution is undecidable
Refer introduction to algorithms by Thomas coreman for more problems
13