Direct Access for Conjunctive Queries with Negations

Florent Capelli

Université d’Artois - CRIL

Probabilistic Circuits and Logic Workshop - Simons Institute

October 18, 2023

Direct Access for Join Queries

Context

Join Queries: Q(x1,,xn)=i=1kRi(zi)Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\vec{z_i})

where zi\vec{z_i} tuple over X={x1,,xn}X = \{x_1,\dots,x_n\}

Example: Q(city,country,name,id)=People(id,name,city)Capital(city,country)Q(city, country, name, id) = People(id, name, city) \wedge Capital(city, country)

People Id Name City
1 Alice Paris
2 Bob Lens
3 Chiara Roma
4 Djibril Berlin
5 Émile Paris
6 Francesca Roma
Capital City Country
Berlin Germany
Paris France
Roma Italy
Q City Country Id Name
Paris France 1 Alice
Roma Italy 3 Chiara
Berlin Germany 4 Djibril
Roma Italy 6 Francesca
Q City Country Id Name
Berlin Germany 4 Djibril
Paris France 1 Alice
Roma Italy 3 Chiara
Roma Italy 6 Francesca

Reorder Q(𝔻)Q(\mathbb{D}) in lexicographical order.

Direct Access

Given Q(x)Q(\vec{x}) and 𝔻\mathbb{D}, simulate array access to Q(𝔻)Q(\mathbb{D}) where Q(𝔻)[k]Q(\mathbb{D})[k] is the kthk^{th} element of Q(𝔻)Q(\mathbb{D}) by lexicographical order.

Q City Country Id Name
Berlin Germany 4 Djibril
Paris France 1 Alice
Roma Italy 3 Chiara
Roma Italy 6 Francesca

Q(𝔻)[4]Q(\mathbb{D})[4] is (Roma,Italy,6,Francesca)(Roma, Italy, 6, Francesca).

Preprocessing expensive: possibly |𝔻|g(|Q|)|\mathbb{D}|^{g(|Q|)}

  • Compute Q(𝔻)Q(\mathbb{D})
  • Sort the table

Answering DA queries Cheap: polylog(|𝔻|)polylog(|\mathbb{D}|)

  • Output Q(𝔻)[k]Q(\mathbb{D})[k] on input kk.

Can we have a better preprocessing?

Tractable Join Queries and Direct Access

There are classes of queries for which one can decide Q(𝔻)Q(\mathbb{D}) \neq \emptyset in polynomial time both in |𝔻||\mathbb{D}| and |Q||Q|.

  • Result generalizes to computing #Q(𝔻)\#Q(\mathbb{D})
  • And to direct access
  • HoF: Bagan, Durand, Olive, Grandjean, Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, Bringmann, Mengel, Schweikardt, Zeevi, Berkholz.

Acyclic queries

Central class of join queries because of their tractability.

R1(x,y,z)R2(x,z,u)R3(x,y,t)R4(y,t)R5(y,v) R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)

Yannakakis Algorithm

Start from the join tree
Load data
Filter every tuple in a relation that cannot be extended to a solution below
Proceed bottom up
Treat every edge of the join tree
Tuples in the root can be extended to full solutions
Reconstructing solution x=1, y=1, z=1, u=0, t=0, v=0

Twisting Yannakakis for Counting

Start from the join tree and data
Bottom up computation of the number of extensions

Total of 44 solutions.

One can use these annotations top-down to solve direct access queries on compatible orders

  • for example for (x,y,z,t,u,v)(x,y,z,t,u,v).
  • Q(𝔻)[3]Q(\mathbb{D})[3] must set x=2,y=1,z=0x=2,y=1,z=0, then proceed down in the tree.

Direct Access on Acyclic Queries

Given a join query Q(x1,,xn)Q(x_1,\dots,x_n), if x1,,xnx_1,\dots,x_n is a compatible order then we can answer direct access queries with precomputation time O(|𝔻|poly(|Q|))O(|\mathbb{D}|poly(|Q|)) and access time O(poly(|Q|)polylog(|𝔻|))O(poly(|Q|)polylog(|\mathbb{D}|)).

Compatible order?

Acyclicity and elimination order

An α\alpha-leaf in a hypergraph H=(V,E)H=(V,E) is xVx \in V such that there is eEe \in E, N(x)eN(x) \subseteq e.

A hypergraph HH is α\alpha-acyclic iff one can obtain \emptyset by successively removing α\alpha-leaves in HH: induces an order on VV called an α\alpha-elimination order

[Brault Baron, 2014], simplification of [Graham-Yu-Özsoyoglu, 1979], also known as “without disruptive trio” in [Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

Direct Access on Acyclic Queries

Given a join query Q(x1,,xn)Q(x_1,\dots,x_n), if xn,,x1x_n,\dots,x_1 is a α\alpha-elimination order then we can answer direct access queries with precomputation time O(|𝔻|poly(|Q|))O(|\mathbb{D}|poly(|Q|)) and access time O(poly(|Q|)polylog(|𝔻|))O(poly(|Q|)polylog(|\mathbb{D}|)).

[Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

Algorithm schema:

  1. Construct a join tree “compatible” with the α\alpha-elimination order
  2. Load data in the join tree, anontate tuples with number of extension.
  3. Use top-down induction on the annotated join tree to answer DA query Q(𝔻)[k]Q(\mathbb{D})[k].

A circuit approach to Direct Access

Yannakakis and circuits

Factorised DB POV: the trace of Yannakakis Algorithm is roughly a decision-DNNF of size linear.

Ordered decision circuit

x1x_1 x2x_2 x3x_3
0 0 0
0 0 1
0 1 0
0 1 1
1 2 0
1 2 1
1 0 2
1 1 2
2 2 2

Exhaustive DPLL, Join Queries and Circuit: Yannakakis Upside-down

Let QQ be a JQ and xn,,x1x_n, \dots, x_1 an α\alpha-elimination order of QQ.

  • Q(𝔻)=dDQ[x1=d](𝔻)Q(\mathbb{D}) = \biguplus_{d \in D} Q[x_1 = d](\mathbb{D})
  • If Q=Q1Q2Q = Q_1 \wedge Q_2 and var(Q1)var(Q2)=var(Q_1) \cap var(Q_2) = \emptyset then Q(𝔻)=Q1(𝔻)×Q2(𝔻)Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})
  • Implement this rules recursively + cache already computed queries QQ'

It produces an ordered circuit computing Q(𝔻)Q(\mathbb{D}) of size poly(Q)O(|𝔻|)poly(Q) O(|\mathbb{D}|)

Solving Direct Access on Circuits: Precomputation

Precomputation: Compute for each decision gate vv on xx and value dd, how many solutions of vv we can build by setting xx to ddd' \leq d.

Solving Direct Access on Circuits: Access

Find the 7th7^{th} solution.

Access: construct the kthk^{th} solution one variable at a time.

Solving Direct Access on Circuits: Access

Find the 13th13^{th} solution.

Access: construct the kthk^{th} solution one variable at a time.

Direct Access for JQs from Circuits

Given a join query Q(x1,,xn)Q(x_1,\dots,x_n), if xn,,x1x_n,\dots,x_1 is a α\alpha-elimination order then we can answer direct access queries with precomputation time O(|𝔻|poly(|Q|))O(|\mathbb{D}|poly(|Q|)) and access time O(poly(|Q|)polylog(|𝔻|))O(poly(|Q|)polylog(|\mathbb{D}|)).

Algorithm schema:

  1. Construct a “compatible” join tree “compatible”
  2. Load data in the join tree, anontate tuples.
  3. Top-down induction to answer DA query Q(𝔻)[k]Q(\mathbb{D})[k].
  1. Construct an ordered circuit CC for Q(𝔻)Q(\mathbb{D}).
  2. Anotate the gates with number of solutions.
  3. Use top-down induction on the circuit to answer DA query Q(𝔻)[k]Q(\mathbb{D})[k] in time npolylog(|𝔻|)n \cdot polylog(|\mathbb{D}|)

Differences between both approaches

Why bother? It looks like the same algorithm on a slightly different data structure.

  • It is not: join tree has an extra underlying “tree” structure, the circuit obtained from this has more structure.

  • There exist relations with small ordered circuit but big tree-structured circuits [LICS17].

  • Can we solve more direct access problem with our technique?

Negative Join queries

Negative Join Queries (NJQ): Q(x1,,xn)=i=1k¬Ri(zi)Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k \neg R_i(\vec{z_i})

  • Big difference:
    • positively encoding ¬R(z)\neg R(\vec{z}) on domain DD need (D|z|#R)(D^{|\vec{z}|}-\#R) tuples.
    • if QQ is tractable for every 𝔻\mathbb{D} then every QQQ' \subseteq Q is tractable too because we can set Ri=R_i = \emptyset in the database, which does not happen in the positive case.

β\beta-acyclic queries

Good candidates for tractability of Negative Join Queries: every QQQ' \subseteq Q is acyclic.

This is a property known as β\beta-acyclicity.

Characterization: HH is β\beta-acyclic iff there exists a β\beta-elimination order xn,,x1x_n, \dots, x_1, ie iteratively removing β\beta-leaves, ie, xx such that edges containing xx are ordered by \subseteq:

Alternatively: a β\beta-elimination order is an order that is an α\alpha-elimination order for every subquery.

Compiling β\beta-acyclic NJQ

Let QQ be an NJQ and xn,,x1x_n, \dots, x_1 a β\beta-elimination order for QQ. Exhaustive DPLL on QQ, 𝔻\mathbb{D} and with order x1xnx_1 \dots x_n returns an ordered circuit of size O(poly(|Q|)poly(|𝔻|))O(poly(|Q|)poly(|\mathbb{D}|)).

Generalization of [LICS17].

Corollary: Direct Access for β\beta-acyclic NJQ with O(poly(|𝔻|))O(poly(|\mathbb{D}|)) preprocessing and access time O(polylog(|𝔻|)poly(|Q|))O(polylog(|\mathbb{D}|)poly(|Q|)) for lexicographical orders based on (reversed) β\beta-elimination orders.

Side observation: we know that “join tree” based techniques fail for β\beta-acyclic NJQ!

Going further

Technique generalizes to:

  • signed join queries (mixing positive and negative atoms)
  • conjunctive (with \exists-quantifiers) signed queries:
    • project \exists directly in the circuits
    • as long as one projects a suffix xkxnx_k \dots x_n
  • Non-acyclic signed conjunctive queries:
    • arbitrary elimination order can be associated with a notion of “width”
    • match (fractional) hypertree width for the positive case
    • corresponds to the width of the “worst” subhypergraph of a given order in the negative case

Conclusion

  • Circuit based tractability of queries is powerful and modular
  • Deports the heavy lifting to the compilation part, treating the circuit is usually easier than working on annotated join trees.
  • Future work:
    • generalize FAQ/AJAR style queries to JNQ by working directly on the circuit
    • Better understanding on the new width measures