A (biased) introduction to Knowledge Compilation

Florent Capelli

UniversitΓ© d’Artois - CRIL

Back to School Conference

October 06, 2023

CRIL

Computer science Research Institute of Lens

Specialized in AI, from GOFAI to ML.

http://www.cril.univ-artois.fr/

Interns, PhD students welcome!

Reach out at capelli@cril.fr.

Lens is roughly the same distance from Paris Gare du Nord than Saclay in the SNCF topology.

Knowledge: representation and reasoning

Knowledge in AI

Knowledge is a central notion for AI:

  • Formalizing knowledge, from Antiquity and before: birth of formal logic
  • Volatile notion which escapes classical logic:
    • Natural language rarely express facts of the form Aβ†’BA \rightarrow B
    • Knowledges may contradict themselves
    • Modeling beliefs and fuzzy facts

Rich fields of research:

  • Many forms of Logic : modal, epistemic, conditional
  • Reasoning with ontologies, under uncertainties, contradictory beliefs

(Propositional) Knowledge Bases

Data + Knowledge = Knowledge Base

Propositional Knowledge Bases:

  • Set 𝒫\mathcal{P} of Propositions: β€œThe model is TWINGO”, β€œThe color is GOLD”
  • Knowledges encoded as propositional formulas: model(TWINGO)∧color(GOLD)β‡’GPSmodel(TWINGO) \wedge color(GOLD) \Rightarrow GPS
  • Knowledge base can be seen as π’¦βŠ†2𝒫\mathcal{K} \subseteq 2^{\mathcal{P}}
    • e.g.: {model(TWINGO),color(GOLD)}βˆ‰π’¦\{model(TWINGO), color(GOLD)\} \notin \mathcal{K} because it does not contain GPSGPS!

Reasoning on Knowledge Bases

Reasoning tasks of various nature:

  • Decision: can we construct a golden Twingo with engine X112-Y?
  • Optimization: what is the cheapest golden Twingo we can construct?
  • Sampling: sample a car model following market previsions?
  • Aggregation: what is the expected benefit from selling a golden Twingo?

Addressing the elephant on the network

This talk is about a specific topic tagged as AI but which has almost nothing to do with ChatGPT.

While ChatGPT represents knowledge and reasons on it in a way, it is merely an illusion:

  • No formal guarantees of the soundness of the reasoning.
  • Sampling, counting are intrisically computationally expensive problems. This is witnessed theoretically and in practice. Need for dedicated tools.

Representing Knowledge Bases

Knowledge base π’¦βŠ†2𝒫\mathcal{K} \subseteq 2^\mathcal{P} for a finite set of propositions 𝒫\mathcal{P}.

Implicit representation

  • Sets of β€œtrue” formulas on 𝒫\mathcal{P}.
  • Natural representation: the one usually written down by humans
  • Deciding whether 𝒦=βˆ…\mathcal{K} = \emptyset is hard

Explicit representation

  • List every kβˆˆπ’¦k \in \mathcal{K}.
  • Knowledge flatten down and hence easy to access
  • HUDGE

Looking for tradeoffs

One Minute to Cool Down

60s

Wrap up:

  • Knowledge is hard to represent and reason with
  • Today: propositional knowledge bases βŠ†2𝒫\subseteq 2^{\mathcal{P}}
  • Goal: Find good tradeoffs between concise representations and tractability

Knowledge Representation Languages

Representing Boolean functions

A Propositional Knowledge Base 𝒦\mathcal{K} is a subset of 2𝒫2^{\mathcal{P}}.

This is a Boolean function: {0,1}𝒫→{0,1}\{0,1\}^{\mathcal{P}} \rightarrow \{0,1\}

How can we represent Boolean functions?

CNF Formulas

F=β‹€(⋁ℓ)F = \bigwedge (\bigvee \ell) where β„“\ell is a literal xx or Β¬x\neg x for some variable xx.

Examples:

F1=(x∨¬y)∧(¬x∨y)F_1=(x \vee \neg y) \wedge (\neg x \vee y)

xx yy F1F_1
00 00 11
00 11 00
11 00 00
11 11 11

F2=(x∨¬z)∧(¬x∨y)∧(x∨y∨z)F_2=(x \vee \neg z) \wedge (\neg x \vee y) \wedge (x \vee y \vee z)

xx yy zz F2F_2
11 11 11 11
00 11 00 11
11 11 00 11
** ** ** 00

The SAT Problem

CNF formulas are extremely simple yet can encode many interesting problems.

Cook, Levin, 1971: The problem SAT of deciding whether a CNF formula is satisfiable is NP-complete.

Valiant 1979: The problem #SAT of counting the satisfying assignment of a CNF formula is #P-complete.

  • Very unlikely that efficient algorithms exists for solving SAT / #SAT
  • Thriving community nevertheless addresses this problem in practice
  • SAT Solver very efficient in many applications

Relevance of CNF formulas

  • Natural encoding: succinctly encodes many problems, witnessed by the many existing industrial benchmarks.
  • Intractable for reasoning and counting

Not very interesting for reasoning tasks.

Circuit Based Representations

Research has focused on factorized representation.

Taken from SMBC Comics
Carl von LinnΓ© (1707-1778)

An example

Data structure based on decision nodes to represent β€œ(x+y+z)(x+y+z) is even”.

Path for x=1x=1, y=0y=0 and z=1z=1 is accepting.

OBDDs

Previous data structure are Ordered Binary Decision Diagrams.

  • Directed Acyclic graphs with one source
  • Sinks are labeled by 00 or 11
  • Internal nodes are decision nodes on a variable in x1,…,xnx_1, \dots, x_n
  • Variables tested in order.

Row of 1

Let’s draw an OBDD that detects whether a matrix xi,jx_{i,j} with 1≀i,j≀31 \leq i, j \leq 3 has a row full of 11.

Row of 1 (Continued)

How many 3Γ—33 \times 3 {0,1}\{0,1\}-matrices have a row full of ones?

  • Case Analysis:
    • Row1=111Row_1=111: 26=642^6=64 matrices

    • Row1β‰ 111,Row2=111Row_1 \neq 111, Row_2=111: (23βˆ’1)Γ—23=56(2^3-1) \times 2^3=56 matrices

    • Row1β‰ 111,Row1β‰ 111,Row3=111Row_1 \neq 111, Row_1 \neq 111, Row_3=111 (23βˆ’1)Γ—(23βˆ’1)=49(2^3-1) \times (2^3-1) = 49 matrices

    • Total: 169169

Tractability of OBDDs

This idea can be generalized to any OBDDs:

Let fβŠ†{0,1}Xf \subseteq \{0,1\}^X be a function computed by an OBDD having EE edges. We can compute #f\#f with O(E)O(E) arithmetic operations.

Generalises to many tasks:

  • Evaluate Pr(f)Pr(f) if probabilities Pr(x=1)Pr(x=1) are given for each x∈Xx \in X
  • Enumerate ff
  • Find the kthk^{th} element of ff in lexicographical order…

Good candidate for representing Boolean functions!

Limits of OBDDs

Orders of variables matters a lot:

fn(M,s)=(s∧ROWn(M))∨(¬s∧COLn(M)) f_n(M,s) = (s \wedge ROW_n(M)) \vee (\neg s \wedge COL_n(M))

Every OBDD computing fnf_n has size β‰₯2O(n)\geq 2^{O(n)}.

FBDD

Same as OBDD but variables may be tested in different order on different path as long as they are tested at most once on every path.

Advantages: more succinct

Drawbacks:

  • cannot be minimized canonically, nor applied etc.
  • actually, not that powerful: ROWn∨COLnROW_n \vee COL_n cannot be represented by polynomial size FBDDs.

One Minute to Cool Down

60s

Wrap up:

CNF : ⋀⋁ℓ\bigwedge \bigvee \ell are natural, powerful but not tractable

OBDD are more tractable but may be exponentially large for simple function
FBDD are more succinct than OBDD but are less tractable

Knowledge Compilation

From CNF to …

Knowledge compilation: amortize the compilation (offline) phase during the query (online) phase

  • Source language: CNF (in this talk and in most existing work)
  • Target language ???

Target Language

Many choices are possible: OBDD, FBDD, and many many others. Depends on what we want to do.

Knowledge Compilation Map [Darwiche, Marquis 2001]

Notation Query Explanation
CO Consistency check Is D satisfiable?
VA Validity check Is D a tautology?
CE Clause entailment does D[Ο„] is sat?
SE Sentential entailment does D1 ⇒ D2?
CT Model counting how many solutions has D?
ME Model enumeration Enumerate the solutions of D.
Β  CO VA CE SE CT ME
DNNF βœ“ Γ— βœ“ Γ— Γ— βœ“
d-DNNF βœ“ βœ“ βœ“ Γ— βœ“ βœ“
dec-DNNF βœ“ βœ“ βœ“ Γ— βœ“ βœ“
FBDD βœ“ βœ“ βœ“ Γ— βœ“ βœ“
OBDD βœ“ βœ“ βœ“ βœ“ βœ“ βœ“

A Knowledge Compiler for FBDD

Exhaustive DPLL with Caching based on Shannon Expansion:

F=(x∨y∨z)∧(x∨¬y∨¬z)∧(¬x∨¬y∨¬z)∧(¬x∨y∨z)F = (x \vee y \vee z) \wedge (x \vee \neg y \vee \neg z) \wedge (\neg x \vee \neg y \vee \neg z) \wedge (\neg x \vee y \vee z)

  • F[x=0]=(y∨z)∧(Β¬y∨¬z)F[x=0] = (y \vee z) \wedge (\neg y \vee \neg z)
  • F[x=1]=(Β¬y∨¬z)∧(y∨z)F[x=1] = (\neg y \vee \neg z) \wedge (y \vee z)
  • F[x=1,y=1]=Β¬zF[x=1,y=1] = \neg z
  • F[x=1,y=0]=zF[x=1,y=0] = z
  • F[x=0,y=1]=Β¬zF[x=0,y=1] = \neg z=F[x=1,y=1]= F[x=1,y=1]
  • F[x=0,y=0]=zF[x=0,y=0] = z=F[x=1,y=0]= F[x=1,y=0]

This scheme is parameterized by:

  • caching policy
  • branching heuristics

Exploiting decomposition

For many tasks, such as model counting, it is interesting to detect syntactic decomposable part of the formula, that is:

F(X)=G(Y)∧H(Z)F(X) = G(Y) \wedge H(Z) and Y∩Z=βˆ…Y \cap Z = \emptyset

  • decDNNF: FBDD + decomposable ∧\wedge-gates
  • Still allows for model counting via the identity #F=#GΓ—#H\#F=\#G\times\#H
  • Compilers can be adapted to detect this rule.

Existing Tools

  • Top-down Model Counter:
    • Cachet
    • SharpSAT
  • Top-down Knowledge Compilers:
    • DSharp
    • D4
  • Bottom-up compilers:
    • SDD
    • c2d
    • CUDD for manipulating Decision Diagrams.
    • ADDMC

The D4 compiler

D4 is a top-down compiler as shown earlier:

  • Use oracle calls to a SAT solver with clause learning to cut branches and speed up later computation
  • Use heuristics to decompose the formula so that it breaks into smaller connected components.
    • Nice tools from graph theory
    • Interesting research questions around these heuristics

The Power of decomposable ∧\wedge-gates

Is it useful to have ∧\wedge-gates in practice?

Yes, exponential gain in circuit size on some instances:

There is a family (fn)nβˆˆβ„•(f_n)_{n \in \mathbb{N}} of Boolean functions such that any FBDD computing fnf_n has size at least 2n2^{n} but fnf_n can be computed by a poly(n)poly(n)-sized dec-DNNF.

One Minute to Cool Down

60s

Wrap up:

  • Many existing Target Languages: chosen depending on the supported queries
  • Branch and bound approach for compilation: importance of heuristics
  • Many actual tools exist and can be used!

KC as a tool

Data structure used in KC can be used in other areas of computer science to leverage existing results.

KC meets Databases

Relational Databases and queries

Data stored as relations (tables):

People Id Name City
1 Alice Paris
2 Bob Lens
3 Carole Lille
4 Djibril Berlin
Capital City Country
Berlin Germany
Paris France
Roma Italy

Query language:

SELECT * FROM People 
JOIN Capital ON People.City=Capital.City
Results Id Name City Country
1 Alice Paris France
4 Djibril Berlin Germany

SELECT COUNT(*) FROM People 
WHERE City NOT IN (SELECT City FROM Capital)
Results Count
2

Conjunctive queries

SQL is a full fledge language, hard to study.

Large class of queries are expressed by a smaller class: conjunctive queries.

People Id Name City
1 Alice Paris
2 Bob Lens
3 Carole Lille
4 Djibril Berlin
Capital City Country
Berlin Germany
Paris France
Roma Italy

Q(Id,Name,City,Country)=People(Id,Name,City)∧Capital(City,Country)Q(Id, Name, City, Country) = People(Id, Name, City) \wedge Capital(City, Country)

  • (2,Bob,Lens,France)βˆ‰Q(2, Bob, Lens, France) \notin Q because (Lens,France)βˆ‰Capital(Lens, France) \notin Capital
  • (1,Alice,Paris,France)∈Q(1, Alice, Paris, France) \in Q because:
    • (Paris,France)∈Capital(Paris, France) \in Capital AND
    • (1,Alice,Paris)∈People(1, Alice, Paris) \in People.

Conjunctive Queries (continued)

Conjunctive queries are queries of the form: Q(X)=⋀iRi(xi⃗)Q(X)=\bigwedge_i R_i(\vec{x_i}) where

  • xiβƒ—\vec{x_i} is a tuple of variables from XX
  • RiR_i are relation symbols

People(Id,Name,City)∧Capital(City,Country)People(Id, Name, City) \wedge Capital(City, Country)

Database 𝔻\mathbb{D}: list of relations R1π”»βŠ†Dx1βƒ—,…,Rpπ”»βŠ†Dxpβƒ—R_1^\mathbb{D}\subseteq D^{\vec{x_1}}, \dots, R_p^\mathbb{D}\subseteq D^{\vec{x_p}} filled with values in domain DD

People𝔻={(Id:1,Name:Alice,City:Paris),(Id:2,Name:Bob,City:Lens)}People^\mathbb{D}= \{(Id: 1,Name: Alice,City: Paris), (Id: 2,Name: Bob,City: Lens)\}

City𝔻={(City:Paris,Country:France),(City:Berlin,Country:Germany)}City^\mathbb{D}= \{(City: Paris, Country: France), (City: Berlin, Country: Germany)\}

Defines a new table Q(𝔻)βŠ†DXQ(\mathbb{D}) \subseteq D^X where Ο„βˆˆQ(𝔻)\tau \in Q(\mathbb{D}) if each part of Ο„\tau on variables xiβƒ—\vec{x_i} are in Ri𝔻R_i^\mathbb{D}.

Q(𝔻)={(Id:1,Name:Alice,City:Paris,Country:France)}Q(\mathbb{D}) = \{(Id: 1, Name: Alice, City: Paris, Country: France)\}

CQ correspond to doing JOIN queries in SQL.

Hardness of solving conjunctive queries

Bad new: given a conjunctive query QQ and a database 𝔻\mathbb{D}, it is NP-complete to decide whether Q(𝔻)β‰ βˆ…Q(\mathbb{D}) \neq \emptyset!

And yet databases systems solve this kind of queries all the time!

  • Query QQ is usually small wrt 𝔻\mathbb{D}
  • Join tables following an optimized query plan
  • Leverage clever indexing algorithm
  • Use clever heuristics based on statistics gathered earlier

Acyclic queries

Central class of conjunctive queries because of their tractability.

R1(x,y,z)∧R2(x,z,u)∧R3(x,y,t)∧R4(y,t)∧R5(y,v) R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)

Every CQ is not acyclic

R(x,y)∧S(y,z)∧T(z,x)R(x,y) \wedge S(y,z) \wedge T(z,x)

Yannakakis Algorithm

Start from the join tree
Load data
Filter every tuple in a relation that cannot be extended to a solution below
Proceed bottom up
Treat every edge of the join tree
Tuples in the root can be extended to full solutions
Reconstructing solution x=1, y=1, z=1, u=0, t=0, v=0

Twisting Yannakakis for Counting

Start from the join tree and data
Bottom up computation of the number of extensions

Total of 44 solutions.

Trace of the Yannakakis Algorithm

Start from the join tree and data
Build a circuit computing the answers of Q bottom up
One gate per line in input tables, decision gates on disappearing variables

The trace of Yannakakis Algorithm on acyclic CQ is a decision-DNNF (non Boolean domain) of size linear in the data.

Factorized Databases

Datastructures known as β€œFactorized Databases”.

For every acyclic query QQ and database 𝔻\mathbb{D}, one can build a decision-DNNF computing Q(𝔻)Q(\mathbb{D}) of size O(poly(|Q|)|𝔻|)O(poly(|Q|)|\mathbb{D}|).

Knowledge compilation style approach. One can efficently:

  • decide whether Q(𝔻)=βˆ…Q(\mathbb{D}) = \emptyset
  • compute #Q(𝔻)\#Q(\mathbb{D})
  • enumerate Q(𝔻)Q(\mathbb{D})

Unify existing results and push the hardness in the compilation part.

Going further

This compilation results can be used to recover many other results:

  • Ranked access: given kk and some order on Q(𝔻)Q(\mathbb{D}), output Q(𝔻)[k]Q(\mathbb{D})[k] in time polylog(|𝔻|)polylog(|\mathbb{D}|)
  • Optimization: find the tuple of Q(𝔻)Q(\mathbb{D}) that maximizes a linear function
  • Aggregation over a semi-ring where w:var(Q)Γ—D→𝕂w : var(Q) \times D \rightarrow \mathbb{K} β¨Ο„βˆˆQ(𝔻)⨂x∈var(Q)w(x,Ο„(x)) \bigoplus_{\tau \in Q(\mathbb{D})} \bigotimes_{x \in var(Q)} w(x,\tau(x))

Knowledge Compilation meets Optimization

Boolean Optimization Problem

BPO problem: maxx1,…,xn∈{0,1}nP(x1,…,xn)\max_{x_1,\dots,x_n \in \{0,1\}^n} P(x_1,\dots,x_n)

where PP is a polynomial.

Observation: PP may be assumed to be multilinear since x2=xx^2 = x over {0,1}\{0,1\}

P=βˆ‘e∈EΞ±e∏i∈exi P = \sum_{e \in E} \alpha_e \prod_{i \in e} x_i

where EβŠ†2VE \subseteq 2^V

Example

P(x1,x2,x3)=x1x2x3βˆ’2x1x3+3x1 P(x_1,x_2,x_3) = x_1x_2x_3 - 2x_1x_3 + 3x_1

P(1,0,0)=3P(1, 0, 0) = 3 is maximal.

Algebraic Model Counting

Semi ring: 𝕂=(K,βŠ•,βŠ—,0βŠ•,1βŠ—)\mathbb{K} = (K,\oplus, \otimes, 0_\oplus, 1_\otimes)

  • βŠ•,βŠ—\oplus, \otimes commutative, associative
  • aβŠ•0βŠ•=aa \oplus 0_\oplus = a, bβŠ—1βŠ—=bb \otimes 1_\otimes = b
  • (aβŠ—(bβŠ•c))=(aβŠ—b)βŠ•(aβŠ—c)(a \otimes (b \oplus c)) = (a \otimes b) \oplus (a \otimes c).

fβŠ†{0,1}Xf \subseteq \{0,1\}^X Boolean function and w:XΓ—{0,1}→𝕂w : X \times \{0,1\} \rightarrow \mathbb{K}:

w(f)=β¨Ο„βˆˆf⨂x∈Xw(x,Ο„(x)) w(f) = \bigoplus_{\tau \in f} \bigotimes_{x \in X} w(x, \tau(x))

AMC Examples

  • If w(x,b)=1w(x,b) = 1 on (β„š,+,Γ—,0,1)(\mathbb{Q},+,\times,0,1): w(f)=βˆ‘Ο„βˆˆf∏x∈X1=#fw(f) = \sum_{\tau \in f} \prod_{x \in X} 1 = \#f
  • Arctic semi-ring (β„š,max,+,βˆ’βˆž,0)(\mathbb{Q}, \max, +, -\infty, 0) w(f)=maxΟ„βˆˆfβˆ‘x∈Xw(x,Ο„(x))w(f) = \max_{\tau \in f} \sum_{x \in X} w(x,\tau(x))

Allows to encode optimization problems on Boolean functions.

BPO and Boolean Functions

For P:=βˆ‘e∈EΞ±e∏i∈exiP := \sum_{e \in E} \alpha_e \prod_{i \in e} x_i define: fP:=β‹€e∈ECef_P := \bigwedge_{e \in E} C_e where Ce:=Ye⇔⋀i∈eXiC_e := Y_e \Leftrightarrow \bigwedge_{i \in e} X_i

CeC_e encodes ye=∏i∈exiy_e = \prod_{i \in e} x_i!

and wPw_P on (β„š,max,+,βˆ’βˆž,0)(\mathbb{Q}, \max, +, -\infty, 0) as:

  • wP(Ye,1)=Ξ±ew_P(Y_e,1) = \alpha_e and
  • wP(Xi,b)=wP(Ye,0)=0w_P(X_i,b)=w_P(Y_e,0) = 0 for b∈{0,1}b \in \{0,1\}.

Encoding BPO as Boolean function: an example

Example: P(x1,x2,x3)=x1x2x3βˆ’2x1x3+3x1P(x_1,x_2,x_3) = x_1x_2x_3 - 2x_1x_3 + 3x_1

  • fP=(Y1⇔(X1∧X2∧X3))∧(Y2⇔(X1∧X3))∧(Y3⇔X1)f_P = (Y_1 \Leftrightarrow (X_1 \wedge X_2 \wedge X_3)) \wedge (Y_2 \Leftrightarrow (X_1 \wedge X_3)) \wedge (Y_3 \Leftrightarrow X_1)
  • wP(Y1,1)=1w_P(Y_1,1) = 1, wP(Y2,1)=βˆ’1w_P(Y_2,1)=-1 and wP(Y3,1)=3w_P(Y_3,1)=3.

wP(fP)=maxΟ„βˆˆfPwP(fΟ„)=wP(Y1=0,Y2=0,Y3=1,X1=1,X2=0,X3=0)=3=P(1,0,0)\begin{align*} w_P(f_P) & = \max_{\tau \in f_P} w_P(f_\tau) \\ & = w_P(Y_1=0, Y_2=0, Y_3=1, X_1=1, X_2=0, X_3=0) \\ & = 3 \\ & = P(1,0,0) \end{align*}

BPO as a Boolean Function

wP(fP)=maxP(x1,…,xn)w_P(f_P) = \max P(x_1,\dots,x_n) where fP=β‹€e∈ECef_P = \bigwedge_{e \in E} C_e.

Try using Algebraic Model Counting for BPO:

  • compile fPf_P into, e.g., OBDD CC
  • compute wP(fP)w_P(f_P) in time O(|C|)O(|C|).

Rich connection

  • Good practical results (e.g.Β using D4)
  • Leverage known tractable classes of CNF to BPO
  • Allows for solving more complex optimization problems

Example: solve maxP(x)\max P(x) such that L<βˆ‘x∈Xx<UL < \sum_{x \in X} x < U:

How?

  • Construct OBDD CC that computes fPf_P
  • Transform CC into Cβ€²C' so that it computes fC∧L<βˆ‘x∈Xx<Uf_C \wedge L < \sum_{x \in X} x < U
  • Compute wP(Cβ€²)w_P(C')

Doggy Bag

Take Home Message

  • Original motivation of Knowledge Compilation: reasoning with knowledge bases
    • Renault Example
    • Configuration problems in general
  • Interesting datastructures to solve many tasks on Boolean Functions
    • Enumeration
    • Algebraic Model Counting
  • Transfer tractability and tools by encoding problems into Boolean Functions:
    • Databases
    • Optimization problems