DeepStochLog: Neural Stochastic Logic Programming

DeepStochLog:
Neural Stochastic Logic Programming
Thomas Winters*1
, Giuseppe Marra*1
Robin Manhaeve1
, Luc De Raedt1,2
1
KU Leuven, Belgium
2
AASS, Örebro University, Sweden
* shared ﬁrst author

DeepStochLog =
★ Neural-Symbolic Framework
★ Introduces “Neural Deﬁnite Clause Grammars”
★ SOTA results on wide variety of tasks
★ Scales better than others

Neural Deﬁnite Clause Grammar

CFG: Context-Free Grammar
E --> N
E --> E, P, N
P --> [“+”]
N --> [“0”]
N --> [“1”]
…
N --> [“9”] 2 + 3 + 8
N
E
E
P N
E
P N
Useful for:
- Is sequence an element of the speciﬁed language?
- What is the “part of speech”-tag of a terminal
- Generate all elements of language

PCFG: Probabilistic Context-Free Grammar
0.5 :: E --> N
0.5 :: E --> E, P, N
1.0 :: P --> [“+”]
0.1 :: N --> [“0”]
0.1 :: N --> [“1”]
…
0.1 :: N --> [“9”] 2 + 3 + 8
N
E
E
P N
E
P N
Useful for:
- What is the most likely parse for this sequence of terminals? (useful for ambiguous grammars)
- What is the probability of generating this string?
0.5
0.1
1
1
0.1
0.1
0.5
0.5
Probability of this parse = 0.5*0.5*0.5*0.1*1*0.1*1*0.1 = 0.000125
Always
sums
to
1
per
non-terminal

DCG: Definite Clause Grammar
e(N) --> n(N).
e(N) --> e(N1), p, n(N2),
{N is N1 + N2}.
p --> [“+”].
n(0) --> [“0”].
n(1) --> [“1”].
…
n(9) --> [“9”]. 2 + 3 + 8
n(2)
e(2)
e(5)
p n(3)
e(13)
p n(8)
Useful for:
- Modelling more complex languages (e.g. context-sensitive)
- Adding constraints between non-terminals thanks to Prolog power (e.g. through unification)
- Extra inputs & outputs aside from terminal sequence (through unification of input variables)

SDCG: Stochastic Deﬁnite Clause Grammar
0.5 :: e(N) --> n(N).
0.5 :: e(N) --> e(N1), p, n(N2),
{N is N1 + N2}.
1.0 :: p --> [“+”].
0.1 :: n(0) --> [“0”].
0.1 :: n(1) --> [“1”].
…
0.1 :: n(9) --> [“9”]. 2 + 3 + 8
n(2)
e(2)
e(5)
p n(3)
e(13)
p n(8)
Useful for:
- Same beneﬁts as PCFGs give to CFG (e.g. most likely parse)
- But: loss of probability mass possible due to failing derivations
0.5
0.1
1
1
0.1
0.1
0.5
0.5
Probability of this parse = 0.5*0.5*0.5*0.1*1*0.1*1*0.1 = 0.000125

NDCG: Neural Deﬁnite Clause Grammar (= DeepStochLog)
Useful for:
- Subsymbolic processing: e.g. tensors as terminals
- Learning rule probabilities using neural networks
0.5 :: e(N) --> n(N).
0.5 :: e(N) --> e(N1), p, n(N2),
{N is N1 + N2}.
1.0 :: p --> [“+”].
nn(number_nn,[X],[Y],[digit]) :: n(Y) --> [X].
digit(Y) :- member(Y,[0,1,2,3,4,5,6,7,8,9]).
2 + 3 + 8
n(2)
e(2)
e(5)
p n(3)
e(13)
p n(8)
0.5
pnumber_nn
( =2)
1
1
0.5
0.5
pnumber_nn
( =3)
pnumber_nn
( =8)
Probability of this parse =
0.5*0.5*0.5*pnumber_nn
( =2)*1*pnumber_nn
( =3)*1*pnumber_nn
( =8)

DeepStochLog NDCG deﬁnition
nn(m,[I1
,…,Im
],[O1
,…,OL
],[D1
,…,DL
]) :: nt --> g1
, …, gn
.
Where:
● nt is an atom
● g1
, …, gn
are goals (goal = atom or list of terminals & variables)
● I1
,…,Im
and O1
,…,OL
are variables occurring in g1
, …, gn
and are the
inputs and outputs of m
● D1
,…,DL
are the predicates specifying the domains of O1
,…,OL
● m is a neural network mapping I1
,…,Im
to probability distribution over
O1
,…,OL
(= over cross product of D1
,…,DL
)

Deriving probability of goal for given terminals in NDCG
Proof derivations d(e(1), ) then turn it into and/or tree

And/Or tree + semiring for different inference types
Probability of goal Most likely derivation
MAX
0.96
0.5
0.5
0.5 0.5 0.5 0.5
0.5
0.5
0.04
0.02
0.98
0.96 0.04
0.02
0.98
PG
(derives(e(1), [ , +, ]) = 0.1141 dmax
(e(1), [ , +, ] ) = argmaxd(e(t))=[ , +, ]
PG
(d(e(1))) = [0,+,1]

Inference optimisation
Inference is optimized using
1. SLG resolution: Prolog tables the returned proof tree(s), and thus creates forest
→ Allows for reusing probability calculation results from intermediate nodes
2. Batched network calls: Evaluate all the required neural network queries ﬁrst
→ Very natural for neural networks to evaluate multiple instances at once using batching
& less overhead in logic & neural network communication

Learning in DeepStochLog
● All AND/OR tree leafs are (neural) probabilities
○ → DeepStochLog “links” these predictions
● Minimize the loss w.r.t. rule probabilities p:
ti
= target probability
D = dataset
L = loss function
Gi
= Goal
θi
= substitution
Ti
= sequence of terminals

DeepStochLog ≈ DeepProbLog for SLP

Sibling of DeepProbLog, but different semantic (PLP vs SLP)
DeepProbLog: neural predicate probability (and thus implicitly over “possible worlds”)
nn(m,[I1
,…,Im
],O,[x1
,…,xL
]) :: neural_predicate(X).
DeepStochLog: neural grammar rule probability (and thus no disjoint sum problem)
nn(m,[I1
,…,Im
],[O1
,…,OL
],[D1
,…,DL
]) :: nt --> g1
, …, gn
.
PLP vs SLP ~ akin to difference between random graph and random walk

Research questions
Q1: Does DeepStochLog reach state-of-the-art predictive performance on
neural-symbolic tasks?
Q2: How does the inference time of DeepStochLog compare to other neural-symbolic
frameworks and what is the role of tabling?
Q3: Can DeepStochLog handle larger-scale tasks?
Q4: Can DeepStochLog go beyond grammars and encode more general programs?

Mathematical expression outcome
T1: Summing MNIST numbers
with pre-speciﬁed # digits
T2: Expressions with images
representing operator or single
digit number.
+ = 137
= 19

Classic grammars, but with MNIST images as terminals
T3: Well-formed brackets as input
(without parse). Task: predict parse.
T4: inputs are strings ak
bl
cm
(or
permutations of [a,b,c], and (k+l+m)%3=0).
Predict 1 if k=l=m, otherwise 0.
→ parse = ( ) ( ( ) ( ) )
= 1
= 0

Natural way of expressing this grammar knowledge
brackets_dom(X) :- member(X, ["(",")"]).
nn(bracket_nn, [X], Y, brackets_dom) :: bracket(Y) --> [X].
t(_) :: s --> s, s.
t(_) :: s --> bracket("("), s, bracket(")").
t(_) :: s --> bracket("("), bracket(")").

All power of Prolog DCGs (here: an
bn
cn
)
letter(X) :- member(X, [a,b,c]).
0.5 :: s(0) --> akblcm(K,L,M),
{K = L; L = M; M = K},
{K = 0, L = 0, M = 0}.
0.5 :: s(1) --> akblcm(N,N,N).
akblcm(K,L,M) --> rep(K,A),
rep(L,B),
rep(M,C),
{A = B, B = C, C = A}.
rep(0, _) --> [].
nn(mnist, [X], C, letter) :: rep(s(N), C) --> [X],
rep(N,C).

Citation networks
T5: Given scientiﬁc paper set with only few labels & citation network, ﬁnd all labels

Word Algebra Problem
T6: natural language text describing algebra problem, predict outcome
E.g."Mark has 6 apples. He eats 2 and divides the remaining among his 2 friends. How many apples did each friend get?"
Uses “empty body trick” to emulate SLP logic rules through SDCGs:
nn(m,[I1
,…,Im
],[O1
,…,OL
],[D1
,…,DL
]) :: nt --> [].
Enables fairly straightforward translation of DeepProbLog programs for a lot of tasks
DeepStochLog performs equally well as DeepProbLog: 96% accuracy

Translating DPL to DSL
DeepProbLog DeepStochLog

Ongoing/future DeepStochLog research
● Structure learning using t(_) (syntactic sugar for PRISM-like switches)
● Support for calculation of true probability (without probability mass loss) by dividing by
probabilities of all possible other possible outputs of given goal
● Larger-scale experiment(s) (e.g. on CLEVR)
● Comparing with more neural symbolic frameworks
● Generative setting (with generative neural networks and using grammar to aid generation)

Thanks!
Paper: DeepStochLog: Neural Stochastic Logic Programming
Code: https://github.com/ml-kuleuven/deepstochlog
Python: pip install deepstochlog
Authors: Thomas Winters*, Giuseppe Marra*, Robin Manhaeve, Luc De Raedt
Twitter: @thomas_wint, @giuseppe__marra, @ManhaeveRobin, @lucderaedt
* shared ﬁrst author

DeepStochLog: Neural Stochastic Logic Programming

More Related Content

What's hot

Similar to DeepStochLog: Neural Stochastic Logic Programming

More from Thomas Winters

Recently uploaded

DeepStochLog: Neural Stochastic Logic Programming