A Tutorial On Proof Theoretic Foundations of Logic Programming
A Tutorial On Proof Theoretic Foundations of Logic Programming
1 Introduction
        The purpose of this tutorial is to give a concise, direct and simple introduc-
tion to logic programming via uniform provability. This is a two phase operation:
firstly, the sequent calculus of classical logic is introduced, then uniform prov-
ability and the related concept of abstract logic programming. The idea is that
anybody familiar with classical logic in its basic, common exposition, and who
also knows Prolog, should be able to find enough information for understanding
how Prolog could be designed, and extended, using this proof theoretic method-
ology. The reader who is already acquainted with sequent calculus can skip the
first section.
                         Axiom                                              Cut
                         A`A                                       Γ ` ∆, A       A, Λ ` Θ
                                                             cut
                                                                         Γ, Λ ` ∆, Θ
                          Γ `∆                                              Γ `∆
                   wL                                               wR
                         A, Γ ` ∆                                         Γ ` ∆, A
                 Γ ` ∆, A     B, Γ ` ∆                                   A, Γ ` ∆, B
            ⊃L                                                     ⊃R
                       A ⊃ B, Γ ` ∆                                      Γ ` ∆, A ⊃ B
          A, Γ ` ∆                     B, Γ ` ∆                    Γ ` ∆, A       Γ ` ∆, B
   ∧LL                        ∧LR                            ∧R
         A ∧ B, Γ ` ∆               A ∧ B, Γ ` ∆                         Γ ` ∆, A ∧ B
                 A, Γ ` ∆     B, Γ ` ∆                     Γ ` ∆, A                      Γ ` ∆, B
            ∨L                                      ∨RL                           ∨RR
                       A ∨ B, Γ ` ∆                       Γ ` ∆, A ∨ B                  Γ ` ∆, A ∨ B
                         Γ ` ∆, A                                         A, Γ ` ∆
                  ¬L                                                ¬R
                        ¬A, Γ ` ∆                                         Γ ` ∆, ¬A
                       A[x/t], Γ ` ∆                                      Γ ` ∆, A
                 ∀L                                                ∀R
                       ∀x.A, Γ ` ∆                                       Γ ` ∆, ∀x.A
                         A, Γ ` ∆                                       Γ ` ∆, A[x/t]
                 ∃L                                                ∃R
                       ∃x.A, Γ ` ∆                                       Γ ` ∆, ∃x.A
                                 where in ∀R and ∃L the variable x is not free in the conclusion
introduce a system for first-order classical logic, called LK, which is a slight
modification of the system bearing the same name proposed by Gentzen in 1934.
By means of examples we will see how the proof construction process works in
the system and we will shortly discuss cut elimination.
       From now on we consider the language of classical logic built over ⊃, ∧, ∨,
¬, ∀, ∃, and we denote formulae by A, B, . . . , and atoms by a, b, . . . , and by p(x),
q(y). The system LK of first order classical sequent calculus with multiplicative
cut is defined in fig. 1. The rules cL and cR are called left and right contraction;
wL and wR are called left and right weakening; the other rules, except for cut,
get the name from the connective they work on.
       Regarding the use of the rules ∀R and ∃L , one should remember that bound
variables can always be renamed, i.e., one can consider, say, ∀x.p(x) the same
as ∀y.p(y).
       The first example shows the behaviour of several rules, in particular it
shows how several structural rules operate. To see cL in action, please wait until
the next section.
4                                Paola Bruscoli and Alessio Guglielmi
                             p(x) ` p(x)
                      wR
                           p(x) ` p(x), q(y)                                q(y) ` q(y)
                    ⊃R                                            wL
                         ` p(x), p(x) ⊃ q(y)                           p(x), q(y) ` q(y)
                 ∃R                                            ⊃R
                      ` p(x), ∃x.(p(x) ⊃ q(y))                      q(y) ` p(x) ⊃ q(y)
              ∀R                                             ∃R
                    ` ∀x.p(x), ∃x.(p(x) ⊃ q(y))                   q(y) ` ∃x.(p(x) ⊃ q(y))
              ⊃L
                                 ∀x.p(x) ⊃ q(y) ` ∃x.(p(x) ⊃ q(y))
                           ⊃R
                                ` (∀x.p(x) ⊃ q(y)) ⊃ ∃x.(p(x) ⊃ q(y))
2.1.3 Example          In LK the formula ∃x.∀y.(p(x) ⊃ p(y)) can be proved this way:
                        p(z) ` p(z)                               p(y) ` p(y)
                 wR                                       wL
                      p(z) ` p(z), p(y)                        p(x), p(y) ` p(y)
              ⊃R                                         ⊃R
                    ` p(z), p(z) ⊃ p(y)                       p(y) ` p(x) ⊃ p(y)
            ∀R                                          ∀L
                 ` p(z), ∀y.(p(z) ⊃ p(y))                    ∀z.p(z) ` p(x) ⊃ p(y)
        ∃R                                         ∀R
             ` p(z), ∃x.∀y.(p(x) ⊃ p(y))                ∀z.p(z) ` ∀y.(p(x) ⊃ p(y))
      ∀R                                          ∃R
            ` ∀z.p(z), ∃x.∀y.(p(x) ⊃ p(y))             ∀z.p(z) ` ∃x.∀y.(p(x) ⊃ p(y))
      cut
                             ` ∃x.∀y.(p(x) ⊃ p(y)), ∃x.∀y.(p(x) ⊃ p(y))
                        cR
                                       ` ∃x.∀y.(p(x) ⊃ p(y))
       The idea behind this proof is to use the cut in order to create two hy-
potheses, one for each branch the proof forks into. In the right branch we
assume ∀z.p(z) and in the left one we assume the contrary, i.e., ¬∀z.p(z) (note
that this is the case when ∀z.p(z) is at the right of the entailment symbol). The
contraction rule at the bottom of the proof ensures that each branch is provided
with the formula to be proved. Of course, it’s easier to prove a theorem when
further hypotheses are available (there is more information, in a sense): the cut
rule creates such information in pairs of opposite formulae.
       One should note that the cut rule is unique in the preceding sense. In
fact, all other rules do not create new information. They enjoy the so called
subformula property, which essentially means that all other rules break formulae
into pieces while going up in a proof. In other words, there is no creation of
new formulae, only subformulae of available formulae are used (there are some
technicalities involved in the ∃R rule, but we can ignore them for now).
       One should also observe that the cut rule is nothing more than a gener-
alised modus ponens, which is the first inference rule ever appeared in logic.
Systems in the sequent calculus are usually given with cut rules, because cut
rules simplify matters when dealing with semantics and when relating several
different systems together, especially those presented in other styles like natural
deduction. In fact, one useful exercise is to try to prove that LK is indeed classical
logic. No matter which way one chooses, chances are very good that the cut rule
will play a major role. Of course, classical logic is a rather well settled matter,
but this is not so for other logics, or for extensions of classical logic, which are
potentially very interesting for automated deduction and logic programming. In
all cases, the cut rule is a key element when trying to relate different viewpoints
about the same logic, like for example syntax and semantics.
Gentzen’s great achievement has been to show that the cut rule is not necessary
in his calculus: we say that it is admissible. In other words, the other rules
are sufficient for the completeness of the inference system. This is of course
good news for automated deduction. In the example above, a logician who
understands the formula to be proved can easily come up with the hypothesis
6                            Paola Bruscoli and Alessio Guglielmi
∀z.p(z), but how is a computer supposed to do so? The formula used is not
in fact a subformula of the formula to be proved. Since it’s the only rule not
enjoying the subformula property, we could say that the cut rule concentrates
in itself all the ‘evil’.
       Given its capital importance, we state below Gentzen’s theorem. The
theorem actually gives us more than just the existence of cut free proofs: we
could also transform a proof with cuts into a cut free one by way of a procedure
(so, the theorem is constructive). This turns out to be fundamental for functional
programming, but we will not talk about that in this paper.
2.2.1 Theorem For every proof in LK there exists a proof with the same con-
clusion in which there is no use of the cut rule.
2.2.2 Example By eliminating from the previous example cut rules according
to the cut elimination algorithm you find in [1], you should obtain a much simpler
proof, probably less intuitive from the semantic viewpoint:
                                          p(y) ` p(y)
                                   wL
                                        p(y), p(x) ` p(y)
                                  ⊃R
                                       p(y) ` p(x) ⊃ p(y)
                              wR
                                   p(y) ` p(z), p(x) ⊃ p(y)
                             ⊃R
                                  ` p(y) ⊃ p(z), p(x) ⊃ p(y)
                         ∀R
                              ` ∀z.(p(y) ⊃ p(z)), p(x) ⊃ p(y)
                        ∃R
                             ` ∃x.∀y.(p(x) ⊃ p(y)), p(x) ⊃ p(y)
                   ∀R
                        ` ∃x.∀y.(p(x) ⊃ p(y)), ∀y.(p(x) ⊃ p(y))
                 ∃R
                       ` ∃x.∀y.(p(x) ⊃ p(y)), ∃x.∀y.(p(x) ⊃ p(y))
                  cR                                                .
                                    ` ∃x.∀y.(p(x) ⊃ p(y))
       We should remark that what happened in the previous example is not very
representative of cut and cut elimination, with respect to the size of the proofs.
In our example, the cut free proof is smaller than the proof with cuts above. In
general, cut free proofs can be immensely bigger than proofs with cuts (there is
a hyperexponential factor between them).
       There is a possibility of exploiting this situation to our advantage. In fact,
as we will also see here, cut free proofs correspond essentially to computations.
Since proofs with cuts could prove the same theorems in much shorter ways, one
could think of using cuts in order to speed up computations, or in order to make
them more manageable when analysing them. This would require using cuts as
sort of oracles that could divine the right thing to do to shorten a computation at
certain given states. This could work by asking a human, or a very sophisticated
artificial intelligence mechanism, to make a guess. We will not be dealing with
these aspects in this paper, but one should keep them in mind when assessing the
merits of the proof theoretic point of view on logic programming and automated
deduction, because they open exciting possibilities.
       As a last remark, let us notice that cut elimination is not simply a necessary
feature for automated deduction, or in general for all what is related to proof
            A Tutorial on Proof Theoretic Foundations of Logic Programming                       7
construction: its use is broader. For example, a cut elimination theorem can be
used to prove the consistency of a system.
2.2.3 Example Prove that LK is consistent. We shall reason by contradiction.
Suppose that LK is inconsistent: we can then find two proofs Π1 and Π2 for
both ` A and ` ¬A. We can then also build the following proof for `:
                                                    Π2
                                                                        A`A
                             Π1                                ¬L
                                                   ` ¬A             ¬A, A `
                                         cut
                           `A                              A`
                     cut                                            .
                                           `
By the cut elimination theorem a proof for ` must exist in which the cut rule
is not used. But this raises a contradiction, since ` is not an axiom, nor is the
conclusion of any rule of LK other than cut.
2.2.4 Example We can reduce the axiom to atomic form. In fact, every axiom
A ` A appearing in a proof can be substituted by a proof
ΠA
                                               A`A
which only uses axioms of the kind a ` a, where A is a generic formula and a
is an atom. To prove this, consider a sequent of the kind A ` A and reason by
induction on the number of logical connectives appearing in A.
Base Case     If A is an atom then ΠA = A ` A.
Inductive Cases
If A = B ⊃ C then
ΠB ΠC
                     ΠA =            B`B                        C`C
                               wR                         wL
                                    B ` B, C                   C, B ` C
                               ⊃L
                                               B ⊃ C, B ` C
                                     ⊃R
                                           B⊃C `B⊃C
All other cases are similar:
ΠB ΠC ΠB ΠC
ΠB ΠB ΠB
∀~x.((b1 ∧ · · · ∧ bh ) ⊃ a),
∀~x.((b1 ∧ · · · ∧ bh ) ⊃ a);
                                                                     aσ ` al σ
                                                              wL
                                                                         ..
                             P ` (b1 ∧ · · · ∧ bh )σ                      .
                      wR                                     wL
                           P ` (b1 ∧ · · · ∧ bh )σ, al σ           P, aσ ` al σ
                      ⊃L
                                   P, ((b1 ∧ · · · ∧ bh ) ⊃ a)σ ` al σ
                              ∀L
                                                     ..
                                                      .
                             ∀L
                                  P, ∀~x.((b1 ∧ · · · ∧ bh ) ⊃ a) ` al σ
                             cL
                P ` a1 σ                          P ` al σ                        P ` ak σ
           ∧R                         ···    ∧R              ···            ∧R
                   ..                                ..                              ..
                    .                                 .                               .
                ∧R                                                                      .
                                        P ` (a1 ∧ · · · ∧ ak )σ
∆ has one leaf which is an axiom (aσ ` al σ); one leaf which stands for the logic
programming problem P ` (b1 ∧ · · · ∧ bh )σ and zero or more leaves P ` ai σ, for
1 6 i 6 k and i 6= l. In the special case where h = 0 we consider the derivation
∆0 :
                                         aσ ` al σ
                                      wL
                                              ..
                                               .
                                    wL
                                        P, aσ ` al σ
                                     ∀L
                                               ..
                                                .
                                   ∀L
                                       P, ∀~x.a ` al σ
                                    cL
                   P ` a1 σ               P ` al σ             P ` ak σ
               ∧R              · · · ∧R               · · · ∧R
                       ..                       ..                ..
                        .                        .                 .
                    ∧R                                               .
                                  P ` (a1 ∧ · · · ∧ ak )σ
For both cases we can consider a special inference rule b, which we call backchain
and which stands for ∆ or ∆0 :
       P ` a1 σ · · · P ` al−1 σ       P ` (b1 ∧ · · · ∧ bh )σ       P ` al+1 σ · · · P ` ak σ
   b                                                                                             ,
                                       P ` (a1 ∧ · · · ∧ ak )σ
10                       Paola Bruscoli and Alessio Guglielmi
                b                                     b
                    P ` r(1)                              P ` q(1)
           b                                     b
               P ` q(1) ∧ r(1)                       P ` q(1) ∧ r(1)
       b                                    b
           P ` p(x)[y/x][x/1]                   P ` p(x)[y/x][x/1]
                                                     P ` q(x) ∧ r(x)
                                                 b
                                                     P ` p(x)[y/x]
P ` p(x)
                                                              ..
                                                               .
                                            P `a∧b            P`b
                                        b
                                                 P `a∧b               P`b
                                             b
                                                             P `a∧b
                                                         b
                                                              P`a
                                                  P `a∧b           P`b
                                             b
                                                         P `a∧b
                                                     b
                                                             P`a
                                                         P `a∧b
                                                     b
                                                             P`a
P`a
using program P, tells the interpreter to organise the computation as the search
for a proof of A and a search for a proof of B, both by using program P. This
12                       Paola Bruscoli and Alessio Guglielmi
                                     Γ `A Γ `B
                                ∧R                   .
                                      Γ `A∧B
      However, given a program P and a goal A∧B, i.e., given a sequent P ` A∧B
to be proved, an interpreter based on LK has more options: it can use left rules by
operating over the program P. Clearly, this does not correspond to our intuition
as to how a logic programming interpreter should work.
      In [7] the authors take a clear course of action:
      1) They define what strategy for searching for proofs could correspond to
a sensible, operational notion similar to the one adopted by Prolog; this leads to
the idea of uniform proof, a uniform proof being a proof where the operational,
sensible strategy is adopted.
       2) They turn their attention to the question: when does uniform prov-
ability correspond to (normal, unrestricted) provability? For the languages and
deductive systems where the two notions coincide, they speak of abstract logic
programming languages.
     We devote the next two subsections to uniform proofs and abstract logic
programming languages.
Γ ` ∆,
where Γ and ∆ are sets of formulae (and not multisets as in system LK). > and
⊥ are formulae different from atoms. Axioms and inference rules of MNP S are
shown in fig. 4. In MNP S negation is obtained through ¬A ≡ A ⊃ ⊥.
      The reader should be able to relate the new system to LK, and then to
any other presentation of classical logic. The exercise is not entirely trivial, and
very useful for understanding the subtleties of the sequent calculus. However, it
is not central to this tutorial.
3.1.1 Exercise Prove that MNP S and LK are equivalent (consider > ≡ A ∨ ¬A
and ⊥ ≡ A ∧ ¬A, for example).
       As we said at the end of Section 2, we are mostly interested in intuitionistic
logic, i.e., the logic produced by the deductive systems we have seen so far, in
            A Tutorial on Proof Theoretic Foundations of Logic Programming                       13
                                            Axioms
                           a, Γ ` ∆, a     ⊥, Γ ` ∆, ⊥        Γ ` ∆, >
                     A, B, Γ ` ∆                               Γ ` ∆, A     Γ ` ∆, B
                ∧L                                        ∧R
                     A ∧ B, Γ ` ∆                                    Γ ` ∆, A ∧ B
                A, Γ ` ∆    B, Γ ` ∆                     Γ ` ∆, A                    Γ ` ∆, B
           ∨L                                   ∨RL                         ∨RR
                     A ∨ B, Γ ` ∆                     Γ ` ∆, A ∨ B                Γ ` ∆, A ∨ B
                     A[x/t], Γ ` ∆                                     Γ ` ∆, A
                ∀L                                              ∀R
                     ∀x.A, Γ ` ∆                                     Γ ` ∆, ∀x.A
                       A, Γ ` ∆                                      Γ ` ∆, A[x/t]
                ∃L                                             ∃R
                     ∃x.A, Γ ` ∆                                     Γ ` ∆, ∃x.A
                                                              Γ ` ∆, ⊥
                                                         ⊥R               where A 6= ⊥
                                                              Γ ` ∆, A
                             where in ∀R and ∃L the variable x is not free in the conclusion
the special case when every sequent contains at most one formula at the right of
the entailment symbol `. From the operational viewpoint, this makes immediate
sense, because it corresponds to always having one goal to prove.
       Actually, the whole truth is more complicated than this, because, as we
said, several formulae at the right of entailment just correspond to their dis-
junction. In the case of a disjunction, the intuitionistic restriction forces an im-
mediate choice about which of the disjuncts the interpreter has to prove (check
the rule ∨R , and keep in mind that cR is not available, neither explicitly nor
implicitly). Restricting oneself to the intuitionistic case is the key to a technical
simplification in the notion of uniform provability. In cases where this simpli-
fication is not desirable, like for linear logic, one can introduce a more refined
notion of uniform provability [5, 6].
       A proof in MNP S, where each sequent has a singleton set as its right-hand
side, is called an I-proof (intuitionistic proof).
      A uniform proof is an I-proof in which each occurrence of a sequent whose
right-hand side contains a non-atomic formula is the lower sequent of the infer-
ence rule that introduces its top-level connective.
      In other words: until a goal is not reduced to atomic form, keep reducing
it; when a goal is in atomic form, then you can use left inference rules.
3.1.2 Example         In MNP S, the formula (a ∨ b) ⊃ (((a ∨ b) ⊃ ⊥) ⊃ ⊥) admits the
14                       Paola Bruscoli and Alessio Guglielmi
                     a`a                              b`b
              ∨RL                              ∨RR
                    a `a∨b       a, ⊥ ` ⊥            b`a∨b          b, ⊥ ` ⊥
               ⊃L                              ⊃L
                     a, (a ∨ b) ⊃ ⊥ ` ⊥              b, (a ∨ b) ⊃ ⊥ ` ⊥
                ∨L
                                   a ∨ b, (a ∨ b) ⊃ ⊥ ` ⊥
                            ⊃R
                                 a ∨ b ` ((a ∨ b) ⊃ ⊥) ⊃ ⊥
                       ⊃R                                            .
                              ` (a ∨ b) ⊃ (((a ∨ b) ⊃ ⊥) ⊃ ⊥)
There is no proof for its left premise, being t different from x, by the proviso of
the ∀R rule.
3.1.4 Example There is no uniform proof for (a ∨ b) ⊃ (b ∨ a). In fact the
following derivation is compulsory if we require uniformity:
                                           a`b b`b
                                      ∨L
                                            a∨b`b
                                   ∨RR
                                           a∨b`b∨a
                              ⊃R                            .
                                      ` (a ∨ b) ⊃ (b ∨ a)
Another derivation exists, similar to this one, where the rule ∨RL is applied
instead of ∨RR .
3.1.5 Example Let A be a generic formula; the formula A ⊃ ((A ⊃ ⊥) ⊃ ⊥)
doesn’t always admit a uniform proof in MNP S, it may or may not have one,
depending on the formula A. Example 3.1.2 shows a case where a uniform proof
exists, for A = a ∨ b. If one considers A = ∀x.(p(x) ∨ q(x)) instead, there is no
way of building a uniform proof. The following derivation does not lead to a
                 A Tutorial on Proof Theoretic Foundations of Logic Programming               15
uniform proof, because the premise is not provable (and the same happens if we
apply ∨RR instead of ∨RL ):
where the premise is not provable, no matter which term t we choose (it cannot
be t = x, of course), and
where again some premises are not provable. This exhausts all possible cases for
that specific A.
      These examples tell us that the language matters as much as the deductive
system as far as uniform provability goes.
The question is now for which languages uniform provability and (unrestricted)
provability coincide. Several languages of goals and clauses can be defined (not
necessarily with formulae in MNP S). Also, we could consider several intuition-
istic sequent systems, different from MNP S, for which uniform provability can
16                        Paola Bruscoli and Alessio Guglielmi
                                                         P ` Gσ      P, a0 σ ` a
                                                   ⊃L
                                                           P, (G ⊃ a0 )σ ` a
                                                     ∀L
                                                                  ..
                     P ` Gσ                                        .
            b                         =             ∀L
                P ` G1 ∨ · · · ∨ Gh                       P, ∀~x.(G ⊃ a0 ) ` a
                                          (∨RL or ∨RR )
                                                                   ..
                                                                    .
                                          (∨RL or ∨RR )
                                                          P ` G1 ∨ · · · ∨ Gh
and
                                                               P, a0 σ, D ` a
                                                          ∀L
                                                                       ..
                                                                        .
                                                        ∀L
                                                             P, ∀~x.a0 , D ` a
            b                         =              ⊃R                          ,
                P ` G1 ∨ · · · ∨ Gh                             P `D⊃a
                                             (∨RL or ∨RR )
                                                                   ..
                                                                    .
                                          (∨RL or ∨RR )
                                                          P ` G1 ∨ · · · ∨ Gh
or a combination of the last two. Please observe that in the derivations above
clauses ∀~x.(G ⊃ a0 ) at the left of the entailment can be singled out thanks to the
implicit use of contraction (P is a set!).
3.2.2 Exercise Consider the following languages of goals and clauses:
                              G := A | D ⊃ A | G ∨ G,
                              D := A | G ⊃ A | ∀x.D,
                         G := > | A | G ∧ G | G ∨ G | ∃x.G,
                         D := A | G ⊃ A | D ∧ D | ∀x.D.
4     Conclusions
In this tutorial we have presented the proof theoretic perspective on logic pro-
gramming introduced by [7]. We just stuck to the basic notions and we tried to
come up with a less formal, and, hopefully, easier to understand exposition than
what is already available.
      Given its introductory nature, we have been forced to leave out several new
exciting, recent developments, especially in connection to linear logic. Indeed,
the methodology of uniform proofs and abstract logic programming lends itself
naturally to applications in new logics, for new languages. Its grounds in proof
theory make it especially useful for logics whose semantics are too complicated,
or simply unavailable.
            A Tutorial on Proof Theoretic Foundations of Logic Programming                  19
References
  [1] J. Gallier. Constructive logics. Part I: A tutorial on proof systems and typed λ-calculi.
      Theoretical Computer Science, 110:249–339, 1993.
  [2] G. Gentzen. Investigations into logical deduction. In M. E. Szabo, editor, The Collected
      Papers of Gerhard Gentzen, pages 68–131. North-Holland, Amsterdam, 1969.
  [3] D. Miller. Documentation for lambda prolog. URL:
      http://www.lix.polytechnique.fr/Labo/Dale.Miller/lProlog/docs.html.
  [4] D. Miller. Overview of linear logic programming. Accepted as a chapter for a book on
      linear logic, edited by Thomas Ehrhard, Jean-Yves Girard, Paul Ruet, and Phil Scott.
      Cambridge University Press. URL:
      http://www.lix.polytechnique.fr/Labo/Dale.Miller/papers/llp.pdf.
  [5] D. Miller. A multiple-conclusion meta-logic. In S. Abramsky, editor, Ninth Annual
      IEEE Symp. on Logic in Computer Science, pages 272–281, Paris, July 1994.
  [6] D. Miller. Forum: A multiple-conclusion specification logic. Theoretical Computer
      Science, 165:201–232, 1996.
  [7] D. Miller, G. Nadathur, F. Pfenning, and A. Scedrov. Uniform proofs as a foundation
      for logic programming. Annals of Pure and Applied Logic, 51:125–157, 1991.
  [8] G. Nadathur and D. Miller. Higher-order Horn clauses. Journal of the ACM, 37(4):777–
      814, 1990.