A (biased) introduction to Knowledge Compilation

Florent Capelli

Université d’Artois - CRIL

Back to School Conference

October 06, 2023

CRIL

Computer science Research Institute of Lens

Specialized in AI, from GOFAI to ML.

http://www.cril.univ-artois.fr/

Interns, PhD students welcome!

Reach out at capelli@cril.fr.

Lens is roughly the same distance from Paris Gare du Nord than Saclay in the SNCF topology.

Knowledge: representation and reasoning

Knowledge in AI

Knowledge is a central notion for AI:

Formalizing knowledge, from Antiquity and before: birth of formal logic
Volatile notion which escapes classical logic:
- Natural language rarely express facts of the form $A \rightarrow B$
- Knowledges may contradict themselves
- Modeling beliefs and fuzzy facts

Rich fields of research:

Many forms of Logic : modal, epistemic, conditional
Reasoning with ontologies, under uncertainties, contradictory beliefs

(Propositional) Knowledge Bases

Data + Knowledge = Knowledge Base

Propositional Knowledge Bases:

Set $\mathcal{P}$ of Propositions: “The model is TWINGO”, “The color is GOLD”
Knowledges encoded as propositional formulas: $model(TWINGO) \wedge color(GOLD) \Rightarrow GPS$
Knowledge base can be seen as 𝒦⊆2𝒫\mathcal{K} \subseteq 2^{\mathcal{P}}
- e.g.: $\{model(TWINGO), color(GOLD)\} \notin \mathcal{K}$ because it does not contain $GPS$ !

Reasoning on Knowledge Bases

Reasoning tasks of various nature:

Decision: can we construct a golden Twingo with engine X112-Y?
Optimization: what is the cheapest golden Twingo we can construct?
Sampling: sample a car model following market previsions?
Aggregation: what is the expected benefit from selling a golden Twingo?

Addressing the elephant on the network

This talk is about a specific topic tagged as AI but which has almost nothing to do with ChatGPT.

While ChatGPT represents knowledge and reasons on it in a way, it is merely an illusion:

No formal guarantees of the soundness of the reasoning.
Sampling, counting are intrisically computationally expensive problems. This is witnessed theoretically and in practice. Need for dedicated tools.

Representing Knowledge Bases

Knowledge base $\mathcal{K} \subseteq 2^\mathcal{P}$ for a finite set of propositions $\mathcal{P}$ .

Implicit representation

Sets of “true” formulas on $\mathcal{P}$ .
Natural representation: the one usually written down by humans
Deciding whether $\mathcal{K} = \emptyset$ is hard

Explicit representation

List every $k \in \mathcal{K}$ .
Knowledge flatten down and hence easy to access
HUDGE

Looking for tradeoffs

One Minute to Cool Down

60s

Wrap up:

Knowledge is hard to represent and reason with
Today: propositional knowledge bases $\subseteq 2^{\mathcal{P}}$
Goal: Find good tradeoffs between concise representations and tractability

Knowledge Representation Languages

Representing Boolean functions

A Propositional Knowledge Base $\mathcal{K}$ is a subset of $2^{\mathcal{P}}$ .

This is a Boolean function: $\{0,1\}^{\mathcal{P}} \rightarrow \{0,1\}$

How can we represent Boolean functions?

CNF Formulas

$F = \bigwedge (\bigvee \ell)$ where $\ell$ is a literal $x$ or $\neg x$ for some variable $x$ .

Examples:

$F_1=(x \vee \neg y) \wedge (\neg x \vee y)$

$x$	$y$	$F_1$
$0$	$0$	$1$
$0$	$1$	$0$
$1$	$0$	$0$
$1$	$1$	$1$

$F_2=(x \vee \neg z) \wedge (\neg x \vee y) \wedge (x \vee y \vee z)$

$x$	$y$	$z$	$F_2$
$1$	$1$	$1$	$1$
$0$	$1$	$0$	$1$
$1$	$1$	$0$	$1$
$*$	$*$	$*$	$0$

The SAT Problem

CNF formulas are extremely simple yet can encode many interesting problems.

Cook, Levin, 1971: The problem SAT of deciding whether a CNF formula is satisfiable is NP-complete.

Valiant 1979: The problem #SAT of counting the satisfying assignment of a CNF formula is #P-complete.

Very unlikely that efficient algorithms exists for solving SAT / #SAT
Thriving community nevertheless addresses this problem in practice
SAT Solver very efficient in many applications

Relevance of CNF formulas

Natural encoding: succinctly encodes many problems, witnessed by the many existing industrial benchmarks.
Intractable for reasoning and counting

Not very interesting for reasoning tasks.

Circuit Based Representations

Research has focused on factorized representation.

An example

Data structure based on decision nodes to represent “ $(x+y+z)$ is even”.

Path for $x=1$ , $y=0$ and $z=1$ is accepting.

OBDDs

Previous data structure are Ordered Binary Decision Diagrams.

Directed Acyclic graphs with one source
Sinks are labeled by $0$ or $1$
Internal nodes are decision nodes on a variable in $x_1, \dots, x_n$
Variables tested in order.

Row of 1

Let’s draw an OBDD that detects whether a matrix $x_{i,j}$ with $1 \leq i, j \leq 3$ has a row full of $1$ .

Row of 1 (Continued)

How many $3 \times 3$ $\{0,1\}$ -matrices have a row full of ones?

Case Analysis:
- $Row_1=111$ : $2^6=64$ matrices
- $Row_1 \neq 111, Row_2=111$ : $(2^3-1) \times 2^3=56$ matrices
- $Row_1 \neq 111, Row_1 \neq 111, Row_3=111$ $(2^3-1) \times (2^3-1) = 49$ matrices
- Total: $169$

Tractability of OBDDs

This idea can be generalized to any OBDDs:

Let $f \subseteq \{0,1\}^X$ be a function computed by an OBDD having $E$ edges. We can compute $\#f$ with $O(E)$ arithmetic operations.

Generalises to many tasks:

Evaluate $Pr(f)$ if probabilities $Pr(x=1)$ are given for each $x \in X$
Enumerate $f$
Find the $k^{th}$ element of $f$ in lexicographical order…

Good candidate for representing Boolean functions!

Limits of OBDDs

Orders of variables matters a lot:

$f_n(M,s) = (s \wedge ROW_n(M)) \vee (\neg s \wedge COL_n(M))$

Every OBDD computing $f_n$ has size $\geq 2^{O(n)}$ .

FBDD

Same as OBDD but variables may be tested in different order on different path as long as they are tested at most once on every path.

Advantages: more succinct

Drawbacks:

cannot be minimized canonically, nor applied etc.
actually, not that powerful: $ROW_n \vee COL_n$ cannot be represented by polynomial size FBDDs.

One Minute to Cool Down

60s

Wrap up:

CNF : $\bigwedge \bigvee \ell$ are natural, powerful but not tractable

OBDD are more tractable but may be exponentially large for simple function

FBDD are more succinct than OBDD but are less tractable

Knowledge Compilation

From CNF to …

Knowledge compilation: amortize the compilation (offline) phase during the query (online) phase

Source language: CNF (in this talk and in most existing work)
Target language ???

Target Language

Many choices are possible: OBDD, FBDD, and many many others. Depends on what we want to do.

Knowledge Compilation Map [Darwiche, Marquis 2001]

Notation	Query	Explanation
CO	Consistency check	Is D satisfiable?
VA	Validity check	Is D a tautology?
CE	Clause entailment	does D[τ] is sat?
SE	Sentential entailment	does D₁ ⇒ D₂?
CT	Model counting	how many solutions has D?
ME	Model enumeration	Enumerate the solutions of D.

	CO	VA	CE	SE	CT	ME
DNNF	✓	×	✓	×	×	✓
d-DNNF	✓	✓	✓	×	✓	✓
dec-DNNF	✓	✓	✓	×	✓	✓
FBDD	✓	✓	✓	×	✓	✓
OBDD	✓	✓	✓	✓	✓	✓

A Knowledge Compiler for FBDD

Exhaustive DPLL with Caching based on Shannon Expansion:

$F = (x \vee y \vee z) \wedge (x \vee \neg y \vee \neg z) \wedge (\neg x \vee \neg y \vee \neg z) \wedge (\neg x \vee y \vee z)$

$F[x=0] = (y \vee z) \wedge (\neg y \vee \neg z)$
$F[x=1] = (\neg y \vee \neg z) \wedge (y \vee z)$
$F[x=1,y=1] = \neg z$
$F[x=1,y=0] = z$
$F[x=0,y=1] = \neg z$ $= F[x=1,y=1]$
$F[x=0,y=0] = z$ $= F[x=1,y=0]$

This scheme is parameterized by:

caching policy
branching heuristics

Exploiting decomposition

For many tasks, such as model counting, it is interesting to detect syntactic decomposable part of the formula, that is:

$F(X) = G(Y) \wedge H(Z)$ and $Y \cap Z = \emptyset$

decDNNF: FBDD + decomposable $\wedge$ -gates
Still allows for model counting via the identity $\#F=\#G\times\#H$
Compilers can be adapted to detect this rule.

Existing Tools

Top-down Model Counter:
- Cachet
- SharpSAT
Top-down Knowledge Compilers:
- DSharp
- D4
Bottom-up compilers:
- SDD
- c2d
- CUDD for manipulating Decision Diagrams.
- ADDMC

The D4 compiler

D4 is a top-down compiler as shown earlier:

Use oracle calls to a SAT solver with clause learning to cut branches and speed up later computation
Use heuristics to decompose the formula so that it breaks into smaller connected components.
- Nice tools from graph theory
- Interesting research questions around these heuristics

The Power of decomposable $\wedge$ -gates

Is it useful to have $\wedge$ -gates in practice?

Yes, exponential gain in circuit size on some instances:

There is a family $(f_n)_{n \in \mathbb{N}}$ of Boolean functions such that any FBDD computing $f_n$ has size at least $2^{n}$ but $f_n$ can be computed by a $poly(n)$ -sized dec-DNNF.

One Minute to Cool Down

60s

Wrap up:

Many existing Target Languages: chosen depending on the supported queries
Branch and bound approach for compilation: importance of heuristics
Many actual tools exist and can be used!

KC as a tool

Data structure used in KC can be used in other areas of computer science to leverage existing results.

KC meets Databases

Relational Databases and queries

Data stored as relations (tables):

Id	Name	City
1	Alice	Paris
2	Bob	Lens
3	Carole	Lille
4	Djibril	Berlin

Capital	City	Country
	Berlin	Germany
	Paris	France
	Roma	Italy

Query language:

SELECT * FROM People 
JOIN Capital ON People.City=Capital.City

Results	Id	Name	City	Country
	1	Alice	Paris	France
	4	Djibril	Berlin	Germany

SELECT COUNT(*) FROM People 
WHERE City NOT IN (SELECT City FROM Capital)

Results	Count
	2

Conjunctive queries

SQL is a full fledge language, hard to study.

Large class of queries are expressed by a smaller class: conjunctive queries.

Id	Name	City
1	Alice	Paris
2	Bob	Lens
3	Carole	Lille
4	Djibril	Berlin

Capital	City	Country
	Berlin	Germany
	Paris	France
	Roma	Italy

$Q(Id, Name, City, Country) = People(Id, Name, City) \wedge Capital(City, Country)$

$(2, Bob, Lens, France) \notin Q$ because $(Lens, France) \notin Capital$
(1,Alice,Paris,France)∈Q(1, Alice, Paris, France) \in Q because:
- $(Paris, France) \in Capital$ AND
- $(1, Alice, Paris) \in People$ .

Conjunctive Queries (continued)

Conjunctive queries are queries of the form: $Q(X)=\bigwedge_i R_i(\vec{x_i})$ where

$\vec{x_i}$ is a tuple of variables from $X$
$R_i$ are relation symbols

$People(Id, Name, City) \wedge Capital(City, Country)$

Database $\mathbb{D}$ : list of relations $R_1^\mathbb{D}\subseteq D^{\vec{x_1}}, \dots, R_p^\mathbb{D}\subseteq D^{\vec{x_p}}$ filled with values in domain $D$

$People^\mathbb{D}= \{(Id: 1,Name: Alice,City: Paris), (Id: 2,Name: Bob,City: Lens)\}$

$City^\mathbb{D}= \{(City: Paris, Country: France), (City: Berlin, Country: Germany)\}$

Defines a new table $Q(\mathbb{D}) \subseteq D^X$ where $\tau \in Q(\mathbb{D})$ if each part of $\tau$ on variables $\vec{x_i}$ are in $R_i^\mathbb{D}$ .

$Q(\mathbb{D}) = \{(Id: 1, Name: Alice, City: Paris, Country: France)\}$

CQ correspond to doing JOIN queries in SQL.

Hardness of solving conjunctive queries

Bad new: given a conjunctive query $Q$ and a database $\mathbb{D}$ , it is NP-complete to decide whether $Q(\mathbb{D}) \neq \emptyset$ !

And yet databases systems solve this kind of queries all the time!

Query $Q$ is usually small wrt $\mathbb{D}$
Join tables following an optimized query plan
Leverage clever indexing algorithm
Use clever heuristics based on statistics gathered earlier

Acyclic queries

Central class of conjunctive queries because of their tractability.

$R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)$

Every CQ is not acyclic

$R(x,y) \wedge S(y,z) \wedge T(z,x)$

Yannakakis Algorithm

Filter every tuple in a relation that cannot be extended to a solution below

Tuples in the root can be extended to full solutions

Reconstructing solution x=1, y=1, z=1, u=0, t=0, v=0

Twisting Yannakakis for Counting

Bottom up computation of the number of extensions

Total of $4$ solutions.

Trace of the Yannakakis Algorithm

Build a circuit computing the answers of Q bottom up — Build a circuit computing the answers of $Q$ bottom up

One gate per line in input tables, decision gates on disappearing variables

The trace of Yannakakis Algorithm on acyclic CQ is a decision-DNNF (non Boolean domain) of size linear in the data.

Factorized Databases

Datastructures known as “Factorized Databases”.

For every acyclic query $Q$ and database $\mathbb{D}$ , one can build a decision-DNNF computing $Q(\mathbb{D})$ of size $O(poly(|Q|)|\mathbb{D}|)$ .

Knowledge compilation style approach. One can efficently:

decide whether $Q(\mathbb{D}) = \emptyset$
compute $\#Q(\mathbb{D})$
enumerate $Q(\mathbb{D})$

Unify existing results and push the hardness in the compilation part.

Going further

This compilation results can be used to recover many other results:

Ranked access: given $k$ and some order on $Q(\mathbb{D})$ , output $Q(\mathbb{D})[k]$ in time $polylog(|\mathbb{D}|)$
Optimization: find the tuple of $Q(\mathbb{D})$ that maximizes a linear function
Aggregation over a semi-ring where $w : var(Q) \times D \rightarrow \mathbb{K}$ $\bigoplus_{\tau \in Q(\mathbb{D})} \bigotimes_{x \in var(Q)} w(x,\tau(x))$

Knowledge Compilation meets Optimization

Boolean Optimization Problem

BPO problem: $\max_{x_1,\dots,x_n \in \{0,1\}^n} P(x_1,\dots,x_n)$

where $P$ is a polynomial.

Observation: $P$ may be assumed to be multilinear since $x^2 = x$ over $\{0,1\}$

$P = \sum_{e \in E} \alpha_e \prod_{i \in e} x_i$

where $E \subseteq 2^V$

Example

$P(x_1,x_2,x_3) = x_1x_2x_3 - 2x_1x_3 + 3x_1$

$P(1, 0, 0) = 3$ is maximal.

Algebraic Model Counting

Semi ring: $\mathbb{K} = (K,\oplus, \otimes, 0_\oplus, 1_\otimes)$

$\oplus, \otimes$ commutative, associative
$a \oplus 0_\oplus = a$ , $b \otimes 1_\otimes = b$
$(a \otimes (b \oplus c)) = (a \otimes b) \oplus (a \otimes c)$ .

$f \subseteq \{0,1\}^X$ Boolean function and $w : X \times \{0,1\} \rightarrow \mathbb{K}$ :

$w(f) = \bigoplus_{\tau \in f} \bigotimes_{x \in X} w(x, \tau(x))$

AMC Examples

If $w(x,b) = 1$ on $(\mathbb{Q},+,\times,0,1)$ : $w(f) = \sum_{\tau \in f} \prod_{x \in X} 1 = \#f$

Arctic semi-ring $(\mathbb{Q}, \max, +, -\infty, 0)$ $w(f) = \max_{\tau \in f} \sum_{x \in X} w(x,\tau(x))$

Allows to encode optimization problems on Boolean functions.

BPO and Boolean Functions

For $P := \sum_{e \in E} \alpha_e \prod_{i \in e} x_i$ define: $f_P := \bigwedge_{e \in E} C_e$ where $C_e := Y_e \Leftrightarrow \bigwedge_{i \in e} X_i$

$C_e$ encodes $y_e = \prod_{i \in e} x_i$ !

and $w_P$ on $(\mathbb{Q}, \max, +, -\infty, 0)$ as:

$w_P(Y_e,1) = \alpha_e$ and
$w_P(X_i,b)=w_P(Y_e,0) = 0$ for $b \in \{0,1\}$ .

Encoding BPO as Boolean function: an example

Example: $P(x_1,x_2,x_3) = x_1x_2x_3 - 2x_1x_3 + 3x_1$

$f_P = (Y_1 \Leftrightarrow (X_1 \wedge X_2 \wedge X_3)) \wedge (Y_2 \Leftrightarrow (X_1 \wedge X_3)) \wedge (Y_3 \Leftrightarrow X_1)$
$w_P(Y_1,1) = 1$ , $w_P(Y_2,1)=-1$ and $w_P(Y_3,1)=3$ .

$\begin{align*} w_P(f_P) & = \max_{\tau \in f_P} w_P(f_\tau) \\ & = w_P(Y_1=0, Y_2=0, Y_3=1, X_1=1, X_2=0, X_3=0) \\ & = 3 \\ & = P(1,0,0) \end{align*}$

BPO as a Boolean Function

$w_P(f_P) = \max P(x_1,\dots,x_n)$ where $f_P = \bigwedge_{e \in E} C_e$ .

Try using Algebraic Model Counting for BPO:

compile $f_P$ into, e.g., OBDD $C$
compute $w_P(f_P)$ in time $O(|C|)$ .

Rich connection

Good practical results (e.g. using D4)
Leverage known tractable classes of CNF to BPO
Allows for solving more complex optimization problems

Example: solve $\max P(x)$ such that $L < \sum_{x \in X} x < U$ :

How?

Construct OBDD $C$ that computes $f_P$
Transform $C$ into $C'$ so that it computes $f_C \wedge L < \sum_{x \in X} x < U$
Compute $w_P(C')$

Doggy Bag

Take Home Message

Original motivation of Knowledge Compilation: reasoning with knowledge bases
- Renault Example
- Configuration problems in general
Interesting datastructures to solve many tasks on Boolean Functions
- Enumeration
- Algebraic Model Counting
Transfer tractability and tools by encoding problems into Boolean Functions:
- Databases
- Optimization problems

A (biased) introduction to Knowledge Compilation

ADVERTISEMENT

CRIL

Computer science Research Institute of Lens

Knowledge: representation and reasoning

Knowledge in AI

(Propositional) Knowledge Bases

Reasoning on Knowledge Bases

Addressing the elephant on the network

Representing Knowledge Bases

One Minute to Cool Down

Knowledge Representation Languages

Representing Boolean functions

CNF Formulas

The SAT Problem

Relevance of CNF formulas

Circuit Based Representations

An example

OBDDs

Row of 1

Row of 1 (Continued)

Tractability of OBDDs

Limits of OBDDs

FBDD

One Minute to Cool Down

Knowledge Compilation

From CNF to …

Target Language

Knowledge Compilation Map [Darwiche, Marquis 2001]

A Knowledge Compiler for FBDD

Exploiting decomposition

Existing Tools

The D4 compiler

The Power of decomposable ∧\wedge-gates

One Minute to Cool Down

KC as a tool

KC meets Databases

Relational Databases and queries

Conjunctive queries

Conjunctive Queries (continued)

Hardness of solving conjunctive queries

Acyclic queries

Every CQ is not acyclic

Yannakakis Algorithm

Twisting Yannakakis for Counting

Trace of the Yannakakis Algorithm

Factorized Databases

Going further

Knowledge Compilation meets Optimization

Boolean Optimization Problem

Example

Algebraic Model Counting

AMC Examples

BPO and Boolean Functions

Encoding BPO as Boolean function: an example

BPO as a Boolean Function

Rich connection

Doggy Bag

Take Home Message

The Power of decomposable $\wedge$ -gates