A Knowledge Compilation Take on Binary Polynomial Optimization

Florent Capelli
joint work with Alberto Del Pia and Silvia di Gregorio

Université d’Artois - CRIL

September 12, 2024

Binary Polynomial Optimization

Problem definition

Binary Polynomial Optimization problem:

maxx1,,xn{0,1}nP(x1,,xn)\max_{x_1,\dots,x_n \in \{0,1\}^n} P(x_1,\dots,x_n)

where PP is a polynomial.

Observation: PP may be assumed to be multilinear since x2=xx^2 = x over {0,1}\{0,1\}

P=eEαeiexi P = \sum_{e \in E} \alpha_e \prod_{i \in e} x_i

where E2VE \subseteq 2^V

Example

P(x1,x2,x3)=x1x2x32x1x3+3x1 P(x_1,x_2,x_3) = x_1x_2x_3 - 2x_1x_3 + 3x_1

x1x_1 x2x_2 x3x_3 P(x)P(x)
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 3
1 0 1 1
1 1 0 3
1 1 1 2

P(1,0,0)=P(1,1,0)=3P(1,0,0)=P(1,1,0)=3 are maximal

Complexity of BPO

Bad news: Solving BPO is NP-hard.

Intuition: a polynomial can encode many things:

  • OR(x,y)=x+yxyOR(x,y) = x+y-xy encodes xyx \vee y on {0,1}\{0,1\}
  • For a graph G=(V,E)G=(V,E), with NN vertices and MM edges VC(V)={v,w}EOR(v,w)VC(V)=\sum_{\{v,w\} \in E} OR(v,w) VC(Ṽ)=MVC(\tilde{V}) = M iff Ṽ\tilde{V} encodes a vertex cover of GG
  • MVC(V)=2N×(VC(V)M)xVvMVC(V) = 2N \times (VC(V) - M)-\sum_{x \in V} v is maximal at Ṽ\tilde{V} iff Ṽ\tilde{V} encodes a minimal vertex cover!

Solving BPO as ILP

BPO is a non-linear optimization problem.

Make it linear so that we can use LP solvers!

maxx{0,1}VeEαevexv\max_{x \in \{0,1\}^V} \sum_{e \in E} \alpha_e \prod_{v \in e} x_v rewrites as:

maxeEαeye\max \sum_{e \in E} \alpha_e y_e such that

ye=vexvy_e = \prod_{v \in e} x_v

yevexvy_e \leq \prod_{v \in e} x_v and vexvye\prod_{v \in e} x_v \leq y_e

yexvy_e \leq x_v for vev \in e and vexvye\prod_{v \in e} x_v \leq y_e

yexvy_e \leq x_v for vev \in e and vexvye1+|e|\sum_{v \in e} x_v \leq y_e-1 + |e|

xv,ye{0,1}x_v, y_e \in \{0,1\}

Integer Linear Program solvers can now solve it!

They may stall on known-to-be easy instances. Is there an alternative way?

BPO as a Boolean Function Problem

Boolean Function

f{0,1}Xf \subseteq \{0,1\}^X is a Boolean function on variables XX.

An assignment τ:X{0,1}\tau : X \rightarrow \{0,1\} satisfies ff iff τf\tau \in f.

Example

x y z
0 0 0
0 0 1
0 1 0
0 1 1
1 1 1

Represented as a formula x(yz)x \Rightarrow (y \wedge z) or by (¬xy)(¬xz)(\neg x \vee y) \wedge (\neg x \vee z)

Weighted Boolean Function

For w:X×{0,1}w : X \times \{0,1\} \rightarrow \mathbb{R} and τ{0,1}X\tau \in \{0,1\}^X consider:

w(τ)=xXw(x,τ(x)) and w(f)=τfw(τ)w(\tau) = \prod_{x \in X} w(x, \tau(x)) \text{ and } w(f) = \sum_{\tau \in f} w(\tau)

Example

  • w(x,0)=1w(x,0) = 1,
  • w(x,1)=2w(x,1)=2,
  • w(y,0)=3w(y,0) = 3,
  • w(y,1)=3w(y,1)=-3,
  • w(z,0)=5w(z,0)=5,
  • w(z,1)=5w(z,1)=-5
x y z ww
0 0 0 1*3*51*3*5 1515
0 0 1 1*3*51*3*-5 15-15
0 1 0 1*3*51*-3*5 15-15
0 1 1 1*3*51*-3*-5 1515
1 1 1 2*3*52*-3*-5 3030
w(f)w(f) 151515+15+3015-15-15+15+30 30

Algebraic Model Counting

w(f)=τfxXw(x,τ(x)) w(f) = \sum_{\tau \in f} \prod_{x \in X} w(x, \tau(x))

w(f)=τfxXw(x,τ(x)) w(f) = \bigoplus_{\tau \in f} \bigotimes_{x \in X} w(x, \tau(x))

where 𝕂=(K,,,0,1)\mathbb{K} = (K,\oplus, \otimes, 0_\oplus, 1_\otimes) is a semi-ring

That is:

  • ,\oplus, \otimes commutative, associative
  • a0=aa \oplus 0_\oplus = a, b1=bb \otimes 1_\otimes = b
  • \otimes distributes over \oplus: (a(bc))=(ab)(ac)(a \otimes (b \oplus c)) = (a \otimes b) \oplus (a \otimes c).

Examples

  • (,+,×,0,1)(\mathbb{R}, +, \times, 0 , 1)
  • Any fields, e.g., /p\mathbb{Z} / p\mathbb{Z}
  • Arctic semi-ring: (,max,+,,0)(\mathbb{Q}, \max, +, -\infty, 0)

AMC over (max,+)(\max,+)-semiring

w(f)=maxτfxXw(x,τ(x)) w(f) = \max_{\tau \in f} \sum_{x \in X} w(x, \tau(x))

This is an optimization task!

Example

  • w(x,0)=1w(x,0) = 1,
  • w(x,1)=2w(x,1)=2,
  • w(y,0)=3w(y,0) = 3,
  • w(y,1)=3w(y,1)=-3,
  • w(z,0)=5w(z,0)=5,
  • w(z,1)=5w(z,1)=-5
x y z ww
0 0 0 1+3+51+3+5 99
0 0 1 1+351+3-5 1-1
0 1 0 13+51-3+5 33
0 1 1 1351-3-5 7-7
1 1 1 2352-3-5 6-6
w(f)w(f) max(9,1,3,7,6)\max(9,-1,3,-7,-6) 9

Encoding BPO as Boolean function

Example: P(x1,x2,x3)=P(x_1,x_2,x_3) = x1x2x3x_1x_2x_3 2x1x3- 2x_1x_3 +3x1+ 3x_1

  • fP=f_P = (Y1(X1X2X3))(Y_1 \Leftrightarrow (X_1 \wedge X_2 \wedge X_3)) \wedge (Y2(X1X3))(Y_2 \Leftrightarrow (X_1 \wedge X_3)) \wedge (Y3X1)(Y_3 \Leftrightarrow X_1)
  • wP(Y1,1)=1w_P(Y_1,1) = 1, wP(Y2,1)=2w_P(Y_2,1)=-2 and wP(Y3,1)=3w_P(Y_3,1)=3.
  • wP(Z,b)=0w_P(Z,b)=0 for every other values.

For every τfP\tau \models f_P we have wP(τ)=P(x1=τ(X1),x2=τ(X2),x3=τ(X3)) w_P(\tau) = P(x_1=\tau(X_1), x_2=\tau(X_2), x_3=\tau(X_3)) and wP(f)=maxPw_P(f) = \max P

Example

For τ(X1)=1,τ(X2)=0,τ(X3)=1\tau(X_1) = 1, \tau(X_2) = 0, \tau(X_3) = 1:

  • if τfP\tau \models f_P then τ(Y1)=0,τ(Y2)=1,τ(Y3)=1\tau(Y_1) = 0, \tau(Y_2) = 1, \tau(Y_3) = 1
  • hence wP(τ)=wP(Y2,1)+wP(Y3,1)=2+3=1w_P(\tau) = w_P(Y_2,1) + w_P(Y_3,1) = -2+3 = 1!

Formal encoding

For P:=eEαeiexiP := \sum_{e \in E} \alpha_e \prod_{i \in e} x_i define:

fP:=eECef_P := \bigwedge_{e \in E} C_e

where Ce:=YeieXiC_e := Y_e \Leftrightarrow \bigwedge_{i \in e} X_i

CeC_e encodes ye=iexiy_e = \prod_{i \in e} x_i!

and wPw_P on (,max,+,,0)(\mathbb{Q}, \max, +, -\infty, 0) as:

  • wP(Ye,1)=αew_P(Y_e,1) = \alpha_e and
  • wP(Xi,b)=wP(Ye,0)=0w_P(X_i,b)=w_P(Y_e,0) = 0 for b{0,1}b \in \{0,1\}.

BPO as a Boolean Function

wP(fP)=maxP(x1,,xn)w_P(f_P) = \max P(x_1,\dots,x_n)

over the (max,+)(\text{max},\text{+})-semiring.

The underlying algorithmic toolbox is very different from ILP solvers providing new insights.

We can use existing toolbox for AMC to solve BPO

  • theoretical results
  • AND practical results

Knowledge Compilation how to solve AMC

Representing Boolean functions

How can we represent Boolean function: f{0,1}Xf \subseteq \{0,1\}^X

So far we have seen: list every satisfying assignment of ff (aka Truth Table)

  • Easy to manipulate since the representation is explicit
  • Not compact

CNF Formulas

F=()F = \bigwedge (\bigvee \ell) where \ell is a literal xx or ¬x\neg x for some variable xx.

Examples

F1=(x¬y)(¬xy)F_1=(x \vee \neg y) \wedge (\neg x \vee y)

xx yy F1F_1
00 00 11
11 11 11
** ** 00

F2=(x¬z)(¬xy)(xyz)F_2=(x \vee \neg z) \wedge (\neg x \vee y) \wedge (x \vee y \vee z)

xx yy zz F2F_2
11 11 11 11
00 11 00 11
11 11 00 11
** ** ** 00

The SAT Problem

CNF formulas are extremely simple yet can encode many interesting problems.

Cook, Levin, 1971: The problem SAT of deciding whether a CNF formula is satisfiable is NP-complete.

Valiant 1979: The problem #SAT of counting the satisfying assignment of a CNF formula is #P-complete.

  • Very unlikely that efficient algorithms exists for solving SAT / #SAT
  • Thriving community nevertheless addresses these problems in practice
  • SAT Solver very efficient in many applications

Relevance of CNF formulas

  • Natural encoding: succinctly encodes many problems, witnessed by the many existing industrial benchmarks.
  • Intractable for algebraic model counting

Looking for tradeoffs between Truth Tables and CNFs!

Circuit Based Representations

Research has focused on factorized representation.

Taken from SMBC Comics
Carl von Linné (1707-1778)

An example

Data structure based on decision nodes to represent “(x+y+z)(x+y+z) is even”.

Path for x=1x=1, y=0y=0 and z=1z=1 is accepting.

OBDDs

Previous data structure are Ordered Binary Decision Diagrams.

  • Directed Acyclic graphs with one source
  • Sinks are labeled by 00 or 11
  • Internal nodes are decision nodes on a variable in x1,,xnx_1, \dots, x_n
  • Variables tested in order.

Counting with OBDDs

How many 3×33 \times 3 {0,1}\{0,1\}-matrices have a row full of ones?

Tractability of OBDDs

This idea can be generalized to any OBDDs:

Let f{0,1}Xf \subseteq \{0,1\}^X be a function computed by an OBDD having EE edges. We can compute #f\#f with O(E)O(E) arithmetic operations.

Generalises to many tasks:

  • Evaluate Pr(f)Pr(f) if probabilities Pr(x=1)Pr(x=1) are given for each xXx \in X
  • Enumerate ff
  • Algebraic Model Counting on any semi-ring.

Back to BPO

wP(fP)=maxP(x1,,xn)w_P(f_P) = \max P(x_1,\dots,x_n) where

  • fP=eEYeieXif_P = \bigwedge_{e \in E} Y_e \Leftrightarrow \bigwedge_{i \in e} X_i
  • wP(Ye,1)=αew_P(Y_e,1)=\alpha_e and 00 otherwise

Solving BPO:

  • transform fPf_P into an OBDD
  • compute wP(fP)w_P(f_P) via dynamic programming on the OBDD itself

A Knowledge Compiler for OBDD

Exhaustive DPLL with Caching based on Shannon Expansion:

F=(xyz)(x¬y¬z)(¬x¬y¬z)(¬xyz)F = (x \vee y \vee z) \wedge (x \vee \neg y \vee \neg z) \wedge (\neg x \vee \neg y \vee \neg z) \wedge (\neg x \vee y \vee z)

  • F[x=0]=(yz)(¬y¬z)F[x=0] = (y \vee z) \wedge (\neg y \vee \neg z)
  • F[x=1]=(¬y¬z)(yz)F[x=1] = (\neg y \vee \neg z) \wedge (y \vee z)
  • F[x=1,y=1]=¬zF[x=1,y=1] = \neg z
  • F[x=1,y=0]=zF[x=1,y=0] = z
  • F[x=0,y=1]=¬zF[x=0,y=1] = \neg z=F[x=1,y=1]= F[x=1,y=1]
  • F[x=0,y=0]=zF[x=0,y=0] = z=F[x=1,y=0]= F[x=1,y=0]

This scheme is parameterized by:

  • caching policy
  • branching heuristics

Exploiting decomposition

For many tasks, such as model counting, it is interesting to detect syntactic decomposable part of the formula, that is:

F(X)=G(Y)H(Z)F(X) = G(Y) \wedge H(Z) and YZ=Y \cap Z = \emptyset

  • decDNNF: OBDD + \wedge-gates decomposable
  • Still allows for algebraic model counting via the identity w(F)=w(G)×w(H)w(F) = w(G) \times w(H)
  • Compilers can be adapted to detect this rule.

The D4 compiler

D4 is a top-down compiler as shown earlier:

  • Use oracle calls to a SAT solver with clause learning to cut branches and speed up later computation
  • Use heuristics to decompose the formula so that it breaks into smaller connected components.
instance d4 (s) scip (s)
bernasconi.20.3 0.002 0.01
bernasconi.20.5 0.04 8.91
bernasconi.20.10 1.21 119.20
bernasconi.20.15 14.92 479.15
bernasconi.25.3 0.00 0.01
bernasconi.25.6 0.19 151.65
bernasconi.25.13 12.59 1 698.18
bernasconi.25.19 442.26 TIMEOUT
bernasconi.25.25 TIMEOUT TIMEOUT

This only illustrates that the underlying structure of Bernasconi instances is better addressed using heuristics from model counting than the ILP approach.

Tractability results

Tractable classes of BPO

P(x1,,xn)=eEαeiexi where E2V P(x_1,\dots,x_n) = \sum_{e \in E} \alpha_e \prod_{i \in e} x_i \text{ where } E \subseteq 2^V

H=(V,E)H=(V,E) is a hypergraph.

Exploit the structure of HH to solve BPO more efficiently.

A toy example

Tree BPO: BPO problem where HH is a tree.

Example: x1x2+5x1x3+3x0x12x0x4+3x4x5x_1x_2+5x_1x_3+3x_0x_1-2x_0x_4+3x_4x_5

Many Known Tractable Classes

  • HH has tree width kk: BPO can be solved in time 2O(k)poly(H)2^{O(k)}poly(H).

  • HH is β\beta-acyclic: BPO can be solved in time poly(H)poly(H).

Dedicated algorithm for each class.

A strange symmetry

Very similar results from Boolean function literature:

  • If a CNF FF has tree width kk then one can construct a DNNF for FF of size 2O(k)poly(F)2^{O(k)}poly(F).

  • If a CNF FF is β\beta-acyclic then one can construct a DNNF for FF of size poly(F)poly(F).

Is there a connection?

Encoding BPO as a CNF

For P:=eEαeiexiP := \sum_{e \in E} \alpha_e \prod_{i \in e} x_i define: fP:=eECef_P := \bigwedge_{e \in E} C_e where Ce:=YeieXiC_e := Y_e \Leftrightarrow \bigwedge_{i \in e} X_i

CeC_e can be encoded as the conjunction of:

  • ie¬XiYe\bigvee_{i \in e} \neg X_i \vee Y_e
  • ¬YeXi\neg Y_e \vee X_i for every iei \in e

fPf_P is naturally encoded as a CNF FPF_P that preserves tree width.

Tractability of BPO via KC

Every known tractability for BPO can be recovered in our framework as follows:

  1. Encode PP as a CNF formula FPF_P
  2. Transform FPF_P into a polynomial size tractable representation CPC_P using known results
  3. Solve AMC on CPC_P

And we get new tractability results for structure that where not known to make BPO tractable.

Beyond BPO

KC approach very versatile:

  • Solve top-k BPO: find the kk best solutions of PP by finding the kk best in the circuit
  • Solve BPO + Cardinality constraints: maxP(x1,,xn)suchthati=1nxiS\max P(x_1,\dots,x_n)\ such\ that \sum_{i=1}^n x_i \in S where S[n]S \subseteq [n] by transforming the circuit
  • Solve pseudo BPO: PP can contains monomial of the form iAxiiB(1xi)\prod_{i \in A} x_i \prod_{i \in B} (1-x_i)

Conclusion

Connection between BPO and Boolean functions:

  • Recover known results and generalize them using the existing rich literature
  • Seems to have practical relevance

Perspective:

KC only exploits combinatorics of the underlying Boolean function. How could we mix existing more algebraic techniques?