From Queries to Circuits

Florent Capelli

UniversitΓ© d’Artois - CRIL

Dagtsuhl Seminar
Representation, Provenance, and Explanations in Database Theory and Logic

January 17, 2024

Representing Relations

Relations from Queries and Databases

  • Q(X)Q(X) query with free variables XX
  • 𝔻\mathbb{D} database on domain DD

define the relation Q(𝔻)βŠ†DXQ(\mathbb{D}) \subseteq D^X of answers of QQ over 𝔻\mathbb{D}

(Q,𝔻)(Q,\mathbb{D}) is an implicit representation of a relation of DXD^X.

Relations from Tables

A relation RβŠ†DXR \subseteq D^X can also be explicitly represented as a table:

xx yy zz
1 2 1
2 0 1
0 0 1
1 2 0

Problems defined on Relations

Given a representation of RβŠ†DXR \subseteq D^X:

  • Emptiness: decide whether R=βˆ…R = \emptyset
  • Totality: decide whether R=DXR = D^X
  • Size: output #R\#R
  • Sample: sample Ο„βˆˆR\tau \in R uniformly
  • Enumeration: output every tuple in RR.
  • Direct Access: given kk, output the kthk^{th} tuple of RR (assuming an order on DXD^X).
  • Aggregation: output β¨Ο„βˆˆRf(Ο„)\bigoplus_{\tau \in R} f(\tau) for some f:DXβ†’(𝕄,βŠ•)f \colon D^X \rightarrow (\mathbb{M}, \oplus)

Representation and Complexity

Complexity varies depending on the representation of RβŠ†DXR \subseteq D^X.

Problem (CQ,𝔻\mathbb{D}) Table
Emptiness NPNP-hard O(1)O(1)
Totality O(|Q|β‹…|𝔻|)O(|Q|\cdot|\mathbb{D}|) O(1)O(1)
Size #P\#P-hard O(1)O(1)

A table is less succinct than a query but more tractable.

Looking for tradeoffs!

A Factorized Approach

{Γ—,βˆͺ}-circuit\{\times,\cup\}\text{-circuit}s

  • XX a finite set of variables.
  • DD a finite domain.

Smoothness

Interpret * either by as β€œany domain value” or β€œa default domain value” (zero suppressed semantics).

{⊎,Γ—}\{\uplus,\times\}-circuit

Syntactic unambiguity

Decision Gates:

Comparing circuits

Gates Boolean Domain
{β‹ˆ,βˆͺ}\{\bowtie, \cup\} NNF
{βˆͺ,Γ—}\{\cup, \times\} DNNF
{⊎,Γ—}\{\uplus, \times\} d-DNNF
{𝖽𝖾𝖼,Γ—}\{\mathsf{dec}, \times\} dec-DNNF
{𝖽𝖾𝖼}\{\mathsf{dec}\} FBDD

Tractability Map

CC a circuit on domain DD and variable XX with n=|X|n=|X|.

Gates Emptiness Totality Size Enumeration
{β‹ˆ,βˆͺ}\{\bowtie, \cup\} NP-complete coNP-complete #P-complete NP-hard
{βˆͺ,Γ—}\{\cup, \times\} O(|C|)O(|C|) coNP-complete #P-complete O(n|C|)O(n|C|)-delay
{⊎,Γ—}\{\uplus, \times\} O(|C|)O(|C|) O(|C|p(n))O(|C|p(n)) O(|C|p(n))O(|C|p(n)) O(|C|)O(|C|)-delay
{𝖽𝖾𝖼,Γ—}\{\mathsf{dec}, \times\} O(|C|)O(|C|) O(|C|p(n))O(|C|p(n)) O(|C|p(n))O(|C|p(n)) O(|C|)O(|C|)-preproc
O(p(n))O(p(n))-delay

Assuming manipulating integers of size NN is polynomial in Nlog|𝔻|\frac{N}{\log|\mathbb{D}|}.

Complexity that only depends on nn is constant time in the data complexity model!

Compilation

Framework

Given a full conjunctive query QQ and a database 𝔻\mathbb{D}, construct a {Γ—,𝖽𝖾𝖼}-circuit\{\times,\mathsf{dec}\}\text{-circuit} CC representing Q(𝔻)Q(\mathbb{D}).

  • Construction cannot be polynomial in |Q||Q| and |𝔻||\mathbb{D}| in the worst case (see lower bounds for DNNF).
  • There exists tractable family of CQs:
    • Acyclic conjunctive queries have {Γ—,𝖽𝖾𝖼}-circuit\{\times,\mathsf{dec}\}\text{-circuit} of size poly(|Q|)Γ—|𝔻|poly(|Q|) \times |\mathbb{D}|.
    • Conjunctive queries have {Γ—,𝖽𝖾𝖼}-circuit\{\times,\mathsf{dec}\}\text{-circuit} of size poly(|Q|)Γ—|𝔻|fhtw(Q)poly(|Q|) \times |\mathbb{D}|^{fhtw(Q)}.

Acyclic Conjunctive Query

A CQ QQ is acyclic iff it has a join tree.

Yannakakis and circuits

Trace of Yannakakis algorithm can be seen as a {Γ—,𝖽𝖾𝖼}-circuit\{\times,\mathsf{dec}\}\text{-circuit}, see [Olteanu, ZΓ‘vodnΓ½].

A DPLL-like procedure

DPLL size guarantees

If x0,…,xnx_0, \dots, x_n is β€œcompatible with a join tree”, produced circuit has size linear in |𝔻||\mathbb{D}|.

Fixing x0,x1x_0, x_1:

  • decision tree with at most #R\#R leaves
  • breaks QQ into two connected components
  • Caching: Q[x0=1,x1=1]Q[x_0=1,x_1=1] and Q[x0=1,x1=0]Q[x_0=1,x_1=0] yield the same recursive call on the left tree.

This is roughly the caching in Yannakakis seen from above.
Elimination order only, no join tree!

DPLL and Negated Atoms

DPLL Guarantees

Q(x1,…,xn)=Q+∧Qβˆ’Q(x_1,\dots,x_n) = Q^+ \wedge Q^-

For NβŠ†Qβˆ’N \subseteq Q^-, let QN=Q+βˆ§β‹€Β¬S∈NSQ_N = Q^+ \wedge \bigwedge_{\neg S \in N} S.

If x1,…,xnx_1,\dots,x_n is good for every QNQ_N then DPLL returns a circuit of size linear in |𝔻||\mathbb{D}| (and poly in |Q||Q|).

In particular, QNQ_N is acyclic for every NN.

If x1,…,xnx_1,\dots,x_n induces a fractional hypertree decomposition of width at most kk for every QNQ_N then DPLL returns a circuit of size O(|𝔻|k)O(|\mathbb{D}|^k) (and poly in |Q||Q|).

Application

For a circuit CC on variables XX and |X|=n|X|=n:

Gates Emptiness Totality Size Enumeration
{𝖽𝖾𝖼,Γ—}\{\mathsf{dec}, \times\} O(|C|)O(|C|) O(|C|p(n))O(|C|p(n)) O(|C|p(n))O(|C|p(n)) O(|C|)O(|C|)-preproc
O(p(n))O(p(n))-delay
{𝖽𝖾𝖼,Γ—}\{\mathsf{dec}, \times\} |𝔻|p(|Q|)|\mathbb{D}|p(|Q|) |𝔻|p(|Q|)|\mathbb{D}|p(|Q|) |𝔻|p(|Q|)|\mathbb{D}|p(|Q|) |𝔻|p(|Q|))|\mathbb{D}|p(|Q|))-preproc
p(|Q|)p(|Q|)-delay

where QQ contains negative atoms!

Bonus: circuit produced by DPLL have an extra structure allowing direct access for induced lexicographical order.

Circuit propaganda Conclusion

  • Circuit are a useful abstraction.
  • Hard-lifting is compilation, algorithms are usually elementary.
  • Unify results, similar ideas popped into many communities.
  • Downside: Structure of the query may be lost during the compilation.