KEMBAR78
DeepStochLog: Neural Stochastic Logic Programming | PDF
DeepStochLog:
Neural Stochastic Logic Programming
Thomas Winters*1
, Giuseppe Marra*1
Robin Manhaeve1
, Luc De Raedt1,2
1
KU Leuven, Belgium
2
AASS, Örebro University, Sweden
* shared first author
DeepStochLog =
★ Neural-Symbolic Framework
★ Introduces “Neural Definite Clause Grammars”
★ SOTA results on wide variety of tasks
★ Scales better than others
Neural Definite Clause Grammar
CFG: Context-Free Grammar
E --> N
E --> E, P, N
P --> [“+”]
N --> [“0”]
N --> [“1”]
…
N --> [“9”] 2 + 3 + 8
N
E
E
P N
E
P N
Useful for:
- Is sequence an element of the specified language?
- What is the “part of speech”-tag of a terminal
- Generate all elements of language
PCFG: Probabilistic Context-Free Grammar
0.5 :: E --> N
0.5 :: E --> E, P, N
1.0 :: P --> [“+”]
0.1 :: N --> [“0”]
0.1 :: N --> [“1”]
…
0.1 :: N --> [“9”] 2 + 3 + 8
N
E
E
P N
E
P N
Useful for:
- What is the most likely parse for this sequence of terminals? (useful for ambiguous grammars)
- What is the probability of generating this string?
0.5
0.1
1
1
0.1
0.1
0.5
0.5
Probability of this parse = 0.5*0.5*0.5*0.1*1*0.1*1*0.1 = 0.000125
Always
sums
to
1
per
non-terminal
DCG: Definite Clause Grammar
e(N) --> n(N).
e(N) --> e(N1), p, n(N2),
{N is N1 + N2}.
p --> [“+”].
n(0) --> [“0”].
n(1) --> [“1”].
…
n(9) --> [“9”]. 2 + 3 + 8
n(2)
e(2)
e(5)
p n(3)
e(13)
p n(8)
Useful for:
- Modelling more complex languages (e.g. context-sensitive)
- Adding constraints between non-terminals thanks to Prolog power (e.g. through unification)
- Extra inputs & outputs aside from terminal sequence (through unification of input variables)
SDCG: Stochastic Definite Clause Grammar
0.5 :: e(N) --> n(N).
0.5 :: e(N) --> e(N1), p, n(N2),
{N is N1 + N2}.
1.0 :: p --> [“+”].
0.1 :: n(0) --> [“0”].
0.1 :: n(1) --> [“1”].
…
0.1 :: n(9) --> [“9”]. 2 + 3 + 8
n(2)
e(2)
e(5)
p n(3)
e(13)
p n(8)
Useful for:
- Same benefits as PCFGs give to CFG (e.g. most likely parse)
- But: loss of probability mass possible due to failing derivations
0.5
0.1
1
1
0.1
0.1
0.5
0.5
Probability of this parse = 0.5*0.5*0.5*0.1*1*0.1*1*0.1 = 0.000125
NDCG: Neural Definite Clause Grammar (= DeepStochLog)
Useful for:
- Subsymbolic processing: e.g. tensors as terminals
- Learning rule probabilities using neural networks
0.5 :: e(N) --> n(N).
0.5 :: e(N) --> e(N1), p, n(N2),
{N is N1 + N2}.
1.0 :: p --> [“+”].
nn(number_nn,[X],[Y],[digit]) :: n(Y) --> [X].
digit(Y) :- member(Y,[0,1,2,3,4,5,6,7,8,9]).
2 + 3 + 8
n(2)
e(2)
e(5)
p n(3)
e(13)
p n(8)
0.5
pnumber_nn
( =2)
1
1
0.5
0.5
pnumber_nn
( =3)
pnumber_nn
( =8)
Probability of this parse =
0.5*0.5*0.5*pnumber_nn
( =2)*1*pnumber_nn
( =3)*1*pnumber_nn
( =8)
DeepStochLog NDCG definition
nn(m,[I1
,…,Im
],[O1
,…,OL
],[D1
,…,DL
]) :: nt --> g1
, …, gn
.
Where:
● nt is an atom
● g1
, …, gn
are goals (goal = atom or list of terminals & variables)
● I1
,…,Im
and O1
,…,OL
are variables occurring in g1
, …, gn
and are the
inputs and outputs of m
● D1
,…,DL
are the predicates specifying the domains of O1
,…,OL
● m is a neural network mapping I1
,…,Im
to probability distribution over
O1
,…,OL
(= over cross product of D1
,…,DL
)
DeepStochLog Inference
Derivation of SDCG
Deriving probability of goal for given terminals in NDCG
Proof derivations d(e(1), ) then turn it into and/or tree
And/Or tree + semiring for different inference types
Probability of goal Most likely derivation
MAX
0.96
0.5
0.5
0.5 0.5 0.5 0.5
0.5
0.5
0.04
0.02
0.98
0.96 0.04
0.02
0.98
PG
(derives(e(1), [ , +, ]) = 0.1141 dmax
(e(1), [ , +, ] ) = argmaxd(e(t))=[ , +, ]
PG
(d(e(1))) = [0,+,1]
Inference optimisation
Inference is optimized using
1. SLG resolution: Prolog tables the returned proof tree(s), and thus creates forest
→ Allows for reusing probability calculation results from intermediate nodes
2. Batched network calls: Evaluate all the required neural network queries first
→ Very natural for neural networks to evaluate multiple instances at once using batching
& less overhead in logic & neural network communication
Learning in DeepStochLog
Learning in DeepStochLog
● All AND/OR tree leafs are (neural) probabilities
○ → DeepStochLog “links” these predictions
● Minimize the loss w.r.t. rule probabilities p:
ti
= target probability
D = dataset
L = loss function
Gi
= Goal
θi
= substitution
Ti
= sequence of terminals
DeepStochLog ≈ DeepProbLog for SLP
Sibling of DeepProbLog, but different semantic (PLP vs SLP)
DeepProbLog: neural predicate probability (and thus implicitly over “possible worlds”)
nn(m,[I1
,…,Im
],O,[x1
,…,xL
]) :: neural_predicate(X).
DeepStochLog: neural grammar rule probability (and thus no disjoint sum problem)
nn(m,[I1
,…,Im
],[O1
,…,OL
],[D1
,…,DL
]) :: nt --> g1
, …, gn
.
PLP vs SLP ~ akin to difference between random graph and random walk
Experiments
Research questions
Q1: Does DeepStochLog reach state-of-the-art predictive performance on
neural-symbolic tasks?
Q2: How does the inference time of DeepStochLog compare to other neural-symbolic
frameworks and what is the role of tabling?
Q3: Can DeepStochLog handle larger-scale tasks?
Q4: Can DeepStochLog go beyond grammars and encode more general programs?
Mathematical expression outcome
T1: Summing MNIST numbers
with pre-specified # digits
T2: Expressions with images
representing operator or single
digit number.
+ = 137
= 19
Performance comparison
Classic grammars, but with MNIST images as terminals
T3: Well-formed brackets as input
(without parse). Task: predict parse.
T4: inputs are strings ak
bl
cm
(or
permutations of [a,b,c], and (k+l+m)%3=0).
Predict 1 if k=l=m, otherwise 0.
→ parse = ( ) ( ( ) ( ) )
= 1
= 0
Natural way of expressing this grammar knowledge
brackets_dom(X) :- member(X, ["(",")"]).
nn(bracket_nn, [X], Y, brackets_dom) :: bracket(Y) --> [X].
t(_) :: s --> s, s.
t(_) :: s --> bracket("("), s, bracket(")").
t(_) :: s --> bracket("("), bracket(")").
All power of Prolog DCGs (here: an
bn
cn
)
letter(X) :- member(X, [a,b,c]).
0.5 :: s(0) --> akblcm(K,L,M),
{K = L; L = M; M = K},
{K = 0, L = 0, M = 0}.
0.5 :: s(1) --> akblcm(N,N,N).
akblcm(K,L,M) --> rep(K,A),
rep(L,B),
rep(M,C),
{A = B, B = C, C = A}.
rep(0, _) --> [].
nn(mnist, [X], C, letter) :: rep(s(N), C) --> [X],
rep(N,C).
Citation networks
T5: Given scientific paper set with only few labels & citation network, find all labels
Word Algebra Problem
T6: natural language text describing algebra problem, predict outcome
E.g."Mark has 6 apples. He eats 2 and divides the remaining among his 2 friends. How many apples did each friend get?"
Uses “empty body trick” to emulate SLP logic rules through SDCGs:
nn(m,[I1
,…,Im
],[O1
,…,OL
],[D1
,…,DL
]) :: nt --> [].
Enables fairly straightforward translation of DeepProbLog programs for a lot of tasks
DeepStochLog performs equally well as DeepProbLog: 96% accuracy
Translating DPL to DSL
DeepProbLog DeepStochLog
Ongoing/future research
Ongoing/future DeepStochLog research
● Structure learning using t(_) (syntactic sugar for PRISM-like switches)
● Support for calculation of true probability (without probability mass loss) by dividing by
probabilities of all possible other possible outputs of given goal
● Larger-scale experiment(s) (e.g. on CLEVR)
● Comparing with more neural symbolic frameworks
● Generative setting (with generative neural networks and using grammar to aid generation)
Thanks!
Paper: DeepStochLog: Neural Stochastic Logic Programming
Code: https://github.com/ml-kuleuven/deepstochlog
Python: pip install deepstochlog
Authors: Thomas Winters*, Giuseppe Marra*, Robin Manhaeve, Luc De Raedt
Twitter: @thomas_wint, @giuseppe__marra, @ManhaeveRobin, @lucderaedt
* shared first author

DeepStochLog: Neural Stochastic Logic Programming

  • 1.
    DeepStochLog: Neural Stochastic LogicProgramming Thomas Winters*1 , Giuseppe Marra*1 Robin Manhaeve1 , Luc De Raedt1,2 1 KU Leuven, Belgium 2 AASS, Örebro University, Sweden * shared first author
  • 2.
    DeepStochLog = ★ Neural-SymbolicFramework ★ Introduces “Neural Definite Clause Grammars” ★ SOTA results on wide variety of tasks ★ Scales better than others
  • 3.
  • 4.
    CFG: Context-Free Grammar E--> N E --> E, P, N P --> [“+”] N --> [“0”] N --> [“1”] … N --> [“9”] 2 + 3 + 8 N E E P N E P N Useful for: - Is sequence an element of the specified language? - What is the “part of speech”-tag of a terminal - Generate all elements of language
  • 5.
    PCFG: Probabilistic Context-FreeGrammar 0.5 :: E --> N 0.5 :: E --> E, P, N 1.0 :: P --> [“+”] 0.1 :: N --> [“0”] 0.1 :: N --> [“1”] … 0.1 :: N --> [“9”] 2 + 3 + 8 N E E P N E P N Useful for: - What is the most likely parse for this sequence of terminals? (useful for ambiguous grammars) - What is the probability of generating this string? 0.5 0.1 1 1 0.1 0.1 0.5 0.5 Probability of this parse = 0.5*0.5*0.5*0.1*1*0.1*1*0.1 = 0.000125 Always sums to 1 per non-terminal
  • 6.
    DCG: Definite ClauseGrammar e(N) --> n(N). e(N) --> e(N1), p, n(N2), {N is N1 + N2}. p --> [“+”]. n(0) --> [“0”]. n(1) --> [“1”]. … n(9) --> [“9”]. 2 + 3 + 8 n(2) e(2) e(5) p n(3) e(13) p n(8) Useful for: - Modelling more complex languages (e.g. context-sensitive) - Adding constraints between non-terminals thanks to Prolog power (e.g. through unification) - Extra inputs & outputs aside from terminal sequence (through unification of input variables)
  • 7.
    SDCG: Stochastic DefiniteClause Grammar 0.5 :: e(N) --> n(N). 0.5 :: e(N) --> e(N1), p, n(N2), {N is N1 + N2}. 1.0 :: p --> [“+”]. 0.1 :: n(0) --> [“0”]. 0.1 :: n(1) --> [“1”]. … 0.1 :: n(9) --> [“9”]. 2 + 3 + 8 n(2) e(2) e(5) p n(3) e(13) p n(8) Useful for: - Same benefits as PCFGs give to CFG (e.g. most likely parse) - But: loss of probability mass possible due to failing derivations 0.5 0.1 1 1 0.1 0.1 0.5 0.5 Probability of this parse = 0.5*0.5*0.5*0.1*1*0.1*1*0.1 = 0.000125
  • 8.
    NDCG: Neural DefiniteClause Grammar (= DeepStochLog) Useful for: - Subsymbolic processing: e.g. tensors as terminals - Learning rule probabilities using neural networks 0.5 :: e(N) --> n(N). 0.5 :: e(N) --> e(N1), p, n(N2), {N is N1 + N2}. 1.0 :: p --> [“+”]. nn(number_nn,[X],[Y],[digit]) :: n(Y) --> [X]. digit(Y) :- member(Y,[0,1,2,3,4,5,6,7,8,9]). 2 + 3 + 8 n(2) e(2) e(5) p n(3) e(13) p n(8) 0.5 pnumber_nn ( =2) 1 1 0.5 0.5 pnumber_nn ( =3) pnumber_nn ( =8) Probability of this parse = 0.5*0.5*0.5*pnumber_nn ( =2)*1*pnumber_nn ( =3)*1*pnumber_nn ( =8)
  • 9.
    DeepStochLog NDCG definition nn(m,[I1 ,…,Im ],[O1 ,…,OL ],[D1 ,…,DL ]):: nt --> g1 , …, gn . Where: ● nt is an atom ● g1 , …, gn are goals (goal = atom or list of terminals & variables) ● I1 ,…,Im and O1 ,…,OL are variables occurring in g1 , …, gn and are the inputs and outputs of m ● D1 ,…,DL are the predicates specifying the domains of O1 ,…,OL ● m is a neural network mapping I1 ,…,Im to probability distribution over O1 ,…,OL (= over cross product of D1 ,…,DL )
  • 10.
  • 11.
  • 12.
    Deriving probability ofgoal for given terminals in NDCG Proof derivations d(e(1), ) then turn it into and/or tree
  • 13.
    And/Or tree +semiring for different inference types Probability of goal Most likely derivation MAX 0.96 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.04 0.02 0.98 0.96 0.04 0.02 0.98 PG (derives(e(1), [ , +, ]) = 0.1141 dmax (e(1), [ , +, ] ) = argmaxd(e(t))=[ , +, ] PG (d(e(1))) = [0,+,1]
  • 14.
    Inference optimisation Inference isoptimized using 1. SLG resolution: Prolog tables the returned proof tree(s), and thus creates forest → Allows for reusing probability calculation results from intermediate nodes 2. Batched network calls: Evaluate all the required neural network queries first → Very natural for neural networks to evaluate multiple instances at once using batching & less overhead in logic & neural network communication
  • 15.
  • 16.
    Learning in DeepStochLog ●All AND/OR tree leafs are (neural) probabilities ○ → DeepStochLog “links” these predictions ● Minimize the loss w.r.t. rule probabilities p: ti = target probability D = dataset L = loss function Gi = Goal θi = substitution Ti = sequence of terminals
  • 17.
  • 18.
    Sibling of DeepProbLog,but different semantic (PLP vs SLP) DeepProbLog: neural predicate probability (and thus implicitly over “possible worlds”) nn(m,[I1 ,…,Im ],O,[x1 ,…,xL ]) :: neural_predicate(X). DeepStochLog: neural grammar rule probability (and thus no disjoint sum problem) nn(m,[I1 ,…,Im ],[O1 ,…,OL ],[D1 ,…,DL ]) :: nt --> g1 , …, gn . PLP vs SLP ~ akin to difference between random graph and random walk
  • 19.
  • 20.
    Research questions Q1: DoesDeepStochLog reach state-of-the-art predictive performance on neural-symbolic tasks? Q2: How does the inference time of DeepStochLog compare to other neural-symbolic frameworks and what is the role of tabling? Q3: Can DeepStochLog handle larger-scale tasks? Q4: Can DeepStochLog go beyond grammars and encode more general programs?
  • 21.
    Mathematical expression outcome T1:Summing MNIST numbers with pre-specified # digits T2: Expressions with images representing operator or single digit number. + = 137 = 19
  • 22.
  • 23.
    Classic grammars, butwith MNIST images as terminals T3: Well-formed brackets as input (without parse). Task: predict parse. T4: inputs are strings ak bl cm (or permutations of [a,b,c], and (k+l+m)%3=0). Predict 1 if k=l=m, otherwise 0. → parse = ( ) ( ( ) ( ) ) = 1 = 0
  • 24.
    Natural way ofexpressing this grammar knowledge brackets_dom(X) :- member(X, ["(",")"]). nn(bracket_nn, [X], Y, brackets_dom) :: bracket(Y) --> [X]. t(_) :: s --> s, s. t(_) :: s --> bracket("("), s, bracket(")"). t(_) :: s --> bracket("("), bracket(")").
  • 25.
    All power ofProlog DCGs (here: an bn cn ) letter(X) :- member(X, [a,b,c]). 0.5 :: s(0) --> akblcm(K,L,M), {K = L; L = M; M = K}, {K = 0, L = 0, M = 0}. 0.5 :: s(1) --> akblcm(N,N,N). akblcm(K,L,M) --> rep(K,A), rep(L,B), rep(M,C), {A = B, B = C, C = A}. rep(0, _) --> []. nn(mnist, [X], C, letter) :: rep(s(N), C) --> [X], rep(N,C).
  • 26.
    Citation networks T5: Givenscientific paper set with only few labels & citation network, find all labels
  • 27.
    Word Algebra Problem T6:natural language text describing algebra problem, predict outcome E.g."Mark has 6 apples. He eats 2 and divides the remaining among his 2 friends. How many apples did each friend get?" Uses “empty body trick” to emulate SLP logic rules through SDCGs: nn(m,[I1 ,…,Im ],[O1 ,…,OL ],[D1 ,…,DL ]) :: nt --> []. Enables fairly straightforward translation of DeepProbLog programs for a lot of tasks DeepStochLog performs equally well as DeepProbLog: 96% accuracy
  • 28.
    Translating DPL toDSL DeepProbLog DeepStochLog
  • 29.
  • 30.
    Ongoing/future DeepStochLog research ●Structure learning using t(_) (syntactic sugar for PRISM-like switches) ● Support for calculation of true probability (without probability mass loss) by dividing by probabilities of all possible other possible outputs of given goal ● Larger-scale experiment(s) (e.g. on CLEVR) ● Comparing with more neural symbolic frameworks ● Generative setting (with generative neural networks and using grammar to aid generation)
  • 31.
    Thanks! Paper: DeepStochLog: NeuralStochastic Logic Programming Code: https://github.com/ml-kuleuven/deepstochlog Python: pip install deepstochlog Authors: Thomas Winters*, Giuseppe Marra*, Robin Manhaeve, Luc De Raedt Twitter: @thomas_wint, @giuseppe__marra, @ManhaeveRobin, @lucderaedt * shared first author