Direct Access for Conjunctive Queries with Negations

Florent Capelli, Oliver Irwin

CRIL, UniversitΓ© d’Artois

June 13, 2024

Direct Access on Join Queries

Join Queries

Join Query : Q(x1,…,xn)=β‹€i=1kRi(𝐱𝐒)Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\mathbf{x_i})

where 𝐱𝐒\mathbf{x_i} is a tuple over X={x1,…,xn}X = \{x_1,\dots,x_n\}

Example: Q(city,country,name,id)=People(id,name,city)∧Capitals(city,country)Q(city, country, name, id) = People(id, name, city) \wedge Capitals(city, country)

People
id name city
1 Alice Paris
2 Bob Lens
3 Chiara Rome
4 Djibril Berlin
5 Γ‰mile Dortmund
6 Francesca Rome
Capitals
city country
Berlin Germany
Paris France
Rome Italy
Q(𝔻)Q(\mathbb{D})
city country name id
Paris France Alice 1
Rome Italy Chiara 3
Berlin Germany Djibril 4
Rome Italy Francesca 6

Direct Access

Quickly access Q(𝔻)[k]Q(\mathbb{D})[k], the kthk^{th} element of Q(𝔻)Q(\mathbb{D}).

Q(𝔻)Q(\mathbb{D})
city country name id
Paris France Alice 1
Rome Italy Chiara 3
Berlin Germany Djibril 4
Rome Italy Francesca 6

Q(𝔻)[2]Q(\mathbb{D})[2]? (Rome,Italy,Chiara,3)(Rome, Italy, Chiara, 3).

Naive Direct Access

Naive algorithm: materialize Q(𝔻)Q(\mathbb{D}) in an array, sort it. Access.

city country name id
…\dots …\dots …\dots …\dots
Berlin Germany Djibril 4
…\dots …\dots …\dots …\dots
Paris France Alice 1
…\dots …\dots …\dots …\dots
Rome Italy Chiara 3
Rome Italy Francesca 6
…\dots …\dots …\dots …\dots

Q(𝔻)[1432]=??Q(\mathbb{D})[1432] = ??

Precomputation : O(#Q(𝔻))O(\#Q(\mathbb{D})) at least (may be worse), very costly

Access : O(1)O(1), nearly free

Orders ?

  1. Order by weights
  2. Lexicographical orders
    • order on the vars of QQ
    • order on domain DD of 𝔻\mathbb{D}

Variable order (city,country,name,id)(city, country, name, id):

city country name id
Berlin Germany Djibril 4
Paris France Alice 1
Rome Italy Chiara 3
Rome Italy Francesca 6

In this talk: only lexicographical orders.

Applications

Direct Access generalizes many tasks that have been previously studied:

  • Uniform sampling without repetitions
  • Ranked enumeration
  • Counting queries:
    • how many answers between Ο„1\tau_1 and Ο„2\tau_2?
    • how many answers extend a partial answer etc.

Beating the Naive Approach

Beating Naive Direct Access

Naive Direct Access:

  • Preprocessing at least O(#Q(𝔻))O(\#Q(\mathbb{D})).
  • Access time O(1)O(1).

Can we have better preprocessing and reasonable access time?

β€œIdeal” complexity:

  • O(|𝔻|)O(|\mathbb{D}|) preprocessing
  • O(log|𝔻|)O(\log |\mathbb{D}|) access time

Complexity of Direct Access

Computing #Q(𝔻)\#Q(\mathbb{D}) given QQ and 𝔻\mathbb{D} is #P\#P-hard.

No Direct Access algorithm with good guarantees for every QQ and 𝔻\mathbb{D}.

Data complexity assumption: for a fixed QQ, what is the best preprocessing f(|𝔻|)f(|\mathbb{D}|) for an access time O(polylog|𝔻|)O(polylog |\mathbb{D}|)?

In this work, all presented complexity in data complexity will also be polynomial for combined complexity.

An easy query?

Q(a,b,c)=A(a,b)∧B(b,c).Q(a,b,c) = A(a,b) \wedge B(b,c).

G a a b b a--b c c b--c

Direct Access for lexicographical order induced by (a,b,c)(a,b,c)?

  • Precomputation O(|𝔻|)O(|\mathbb{D}|)
  • Access time O(log|𝔻|)O(\log |\mathbb{D}|)
a b
0 0
1 1
2 1
b c
0 0
0 1
0 2
1 1
1 2

Precomputation :

  • #Q(0,0,_)=3\#Q(0,0,\_) = 3
  • #Q(1,1,_)=2\#Q(1,1,\_) = 2
  • #Q(2,1,_)=2\#Q(2,1,\_) = 2

Access Q[5]Q[5]:

  • a=0,b=0a=0,b=0: not enough solutions
  • a=1,b=1a=1,b=1: enough! 33 solutions smaller than (1,1,_)(1,1,\_)
  • Look for the second solution of B(1,_)B(1,\_): a=1,b=1,c=2a=1,b=1,c=2

A not so easy query

Q(a,c,b)=A(a,b)∧B(b,c).Q(a,c,b) = A(a,b) \wedge B(b,c).

G a a b b a--b c c b--c

Direct Access for lexicographical order induced by (a,c,b)(a,c,b)?

  • Precomputation O(|𝔻|2)O(|\mathbb{D}|^2)
  • Access time O(log|𝔻|)O(\log |\mathbb{D}|)

Reduces to multiplying two {0,1}\{0,1\}-matrices M,NM,N over β„•\mathbb{N}:

  • (i,j)∈A(i,j) \in A iff M[i,j]=1M[i,j]=1, (j,k)∈N(j,k) \in N iff N[j,k]=1N[j,k]=1
  • #Q(i,j,_)=(MN)[i,j]\#Q(i,j,\_) = (MN)[i,j]
  • Direct Access can be used to find #Q(i,j,_)\#Q(i,j,\_) with O(log|𝔻|)O(\log |\mathbb{D}|) queries.

Characterizing preprocessing time

Given a query QQ and order Ο€\pi on its variables, we can compute ΞΉ(Q,Ο€)\iota(Q,\pi) such that:

  • Tractable Direct access for QQ on 𝔻\mathbb{D}:
    • preprocessing OΜƒ(|𝔻|ΞΉ(Q,Ο€))\tilde{O}(|\mathbb{D}|^{\iota(Q,\pi)})
    • access O(log|𝔻|){O}(\log |\mathbb{D}|)
  • Tight fine-grained lower bounds:
    • if possible with OΜƒ(|𝔻|k)\tilde{O}(|\mathbb{D}|^k) preprocessing with k<ΞΉ(Q,Ο€)k < \iota(Q,\pi)
    • then Zero-Clique Conjecture is false
      (we can find 00-weighted kk-cliques in graphs in time <|G|kβˆ’Ξ΅< |G|^{k-\varepsilon})
  • Function ΞΉ\iota closely related to fractional hypertree width.
  1. Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries, N. Carmeli, N. Tziavelis, W. Gatterbauer, B. Kimelfeld, M. Riedewald
  2. Tight Fine-Grained Bounds for Direct Access on Join Queries, K. Bringmann, N. Carmeli, S. Mengel

End of the story?

So, if we understand everything for Direct Access and lexicographical orders, what is our contribution?

Signed Join Queries

Definition

Q=β‹€i=1kPi(𝐳𝐒)Q = \bigwedge_{i=1}^k P_i(\mathbf{z_i}) β‹€i=1lΒ¬Ni(𝐳𝐒)\bigwedge_{i=1}^l \lnot N_i(\mathbf{z_i})

Negation interpreted over a given domain DD:

NN
x1x_1 x2x_2 x3x_3
0 1 0
Β¬N\lnot N on {0,1}\{0,1\}
x1x_1 x2x_2 x3x_3
0 0 0
0 0 1
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
  • Β¬N(x1,…,xk)\neg N(x_1,\dots,x_k) encoded with |D|kβˆ’#N|D|^{k}-\#N tuples.
  • Relation with SAT: Β¬N\neg N is x1∨¬x2∨x3x_1 \vee \neg x_2 \vee x_3

Positive Encoding not Optimal

Q(x1,…,xn)=Β¬N(x1,…,xn)Q(x_1,\dots,x_n) = \neg N(x_1,\dots,x_n), domain {0,1}\{0,1\}.

Positive encoding: preprocessing O(2n)O(2^n)

N
x1x_1 x2x_2 x3x_3
0 1 0
1 0 1
  • Q(𝔻)[1]Q(\mathbb{D})[1]? x1=0,x2=0,x3=0x_1 = 0, x_2 = 0, x_3=0 ie [0]2[0]_2!
  • Q(𝔻)[2]Q(\mathbb{D})[2]? x1=0,x2=0,x3=1x_1 = 0, x_2 = 0, x_3=1 ie [1]2[1]_2!
  • Q(𝔻)[3]Q(\mathbb{D})[3]? x1=0,x2=1,x3=0x_1 = 0, x_2 = 1, x_3=0 ie [2]2[2]_2? x1=0,x2=1,x3=1x_1 = 0, x_2 = 1, x_3=1 ie [3]2[3]_2!
  • Q(𝔻)[k]Q(\mathbb{D})[k]? [kβˆ’1+pk]2[k-1+p_k]_2
    where pk={t∈N∣t≀k}p_k=\{t \in N \mid t \leq k\}

Linear preprocessing!

Hardness of subqueries

Q1=R(1,2,3)∧S(1,2)∧T(2,3)∧U(3,1)Q_1 = R(1, 2, 3) \land S(1, 2) \land T(2, 3) \land U(3, 1)

linear preprocessing

Q2=S(1,2)∧T(2,3)∧U(3,1)Q_2 = S(1, 2) \land T(2, 3) \land U(3, 1)

non-linear preprocessing

Subqueries may be harder to solve than the query itself!

Subqueries and negative atoms

Q1β€²=Q_1' = Β¬R(1,2,3)\lnot R(1, 2, 3) ∧S(1,2)∧T(2,3)∧U(3,1)\land S(1, 2) \land T(2, 3) \land U(3, 1)

Equivalent to Q2Q_2 if R=βˆ…R=\emptyset

Q2=S(1,2)∧T(2,3)∧U(3,1)Q_2 = S(1, 2) \land T(2, 3) \land U(3, 1)

non-linear preprocessing (triangle)

DA for Q=P∧NQ = P \wedge N implies DA for Q=P∧Nβ€²Q = P \wedge N' for every Nβ€²βŠ†NN' \subseteq N!

Measuring hardness of SJQ

Good candidate for Q=Q+∧Qβˆ’Q = Q^+ \wedge Q^-:

Signed-HyperOrder Width show(Q,Ο€)=maxQβ€²βŠ†Qβˆ’ΞΉ(Q+∧Qβ€²,Ο€)show(Q,\pi) = \max_{Q' \subseteq Q^-} \iota(Q^+ \wedge Q', \pi)

For QQ a (positive) JQ signed JQ, and Ο€\pi a variable ordering, we can solve DA with

  • Preprocessing OΜƒ(|𝔻|ΞΉ(Q,Ο€))\tilde{O}(|\mathbb{D}|^{\iota(Q,\pi)}) OΜƒ(|𝔻|1+show(Q,Ο€))\tilde{O}(|\mathbb{D}|^{1+show(Q,\pi)})
  • Access time O(log|𝔻|)O(\log |\mathbb{D}|)

Our contribution : new island of tractability for Signed JQ!

A word on show

Signed HyperOrder Width (and incidentally, our result) generalizes:

  • Ξ²\beta-acyclicity (#SAT and #NCQ are already known tractable)
  • signed-acyclicity (Model Checking for SCQ known to be tractable)
  • Nest set width (SAT / Model Checking for NCQ known to be tractable)

Basically, everything that is known to be tractable on SCQ/NCQ.

  1. Understanding model counting for Ξ²-acyclic CNF-formulas, J. Brault-Baron, F. C., S. Mengel
  2. De la pertinence de l’énumΓ©ration: complexitΓ© en logiques propositionnelle et du premier ordre, J. Brault-Baron
  3. Tractability Beyond ß-Acyclicity for Conjunctive Queries with Negation, M. Lanzinger

Our algorithm: a circuit approach

Relational Circuits

x1x_1 x2x_2 x3x_3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 1
1 0 2
1 1 1
1 1 2
1 2 0
1 2 1
2 0 1
2 0 2
2 2 1
2 2 2

Ordered Relational Circuits

Factorized representation of relation RβŠ†DXR \subseteq D^X:

  • Inputs gates : ⊀\top & βŠ₯\bot
  • Decision gates
  • Cartesian products: Γ—\times-gates

Ordered: decision gates below xix_i only mention xjx_j with j>ij>i.

Direct Access on Relational Circuits

For CC on domain DD, variables x1,…,xnx_1,\dots,x_n, DA possible :

  • Preprocessing: O(|C|log|D|)O(|C| \log |D|)
  • Access time: O(nlog|D|)O(n \log |D|)

Preprocessing

Idea : for each gate vv over xix_i and for each domain value dd

compute the size of the relation where xix_i is set to a value d′≀dd'\leqslant d

Preprocessing

Direct Access 7th solution

Compute the 7th solution β†’\to 111

Direct Access the 13th solution

Compute the 13th solution β†’\to 221

Solving DA for SCQ

SCQ Q(x1,…,xn)Q(x_1,\dots,x_n), Ο€=(x1,…,xn)\pi = (x_1,\dots,x_n).

Preprocessing: OΜƒ(|𝔻|1+show(Q,Ο€))\tilde{O}(|\mathbb{D}|^{1+show(Q,\pi)})

  1. Construct Ο€\pi-ordered circuit CC of size OΜƒ(|𝔻|1+show(Q,Ο€)poly(Q))\tilde{O}(|\mathbb{D}|^{1+show(Q,\pi)} poly(Q))
  2. Preprocess CC in time O(|C|log|𝔻|)O(|C| \log |\mathbb{D}|).

Direct Access : O(log|𝔻|)O(\log |\mathbb{D}|)

  1. Directly on CC
  2. in time O(nlog|D|)O(n\log |D|) !

QQ, nn considered constant here!

DPLL: building circuits

Compilation based on a variation of DPLL :

  1. Q(𝔻)=⨄d∈D[x1=d]Γ—Q[x1=d](𝔻)Q(\mathbb{D}) = \biguplus_{d \in D} [x_1 = d] \times Q[x_1=d](\mathbb{D})
  2. Q(𝔻)=Q1(𝔻)Γ—Q2(𝔻)Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D}) if Q=Q1∧Q2Q = Q_1 \wedge Q_2 with var(Q1)∩var(Q2)=βˆ…var(Q_1) \cap var(Q_2) = \emptyset
  3. Top down induction + caching
https://florent.capelli.me/cytoscape/dpll.html

Going further

Other usage of circuits

  1. Extension to βˆƒ\existsSJQ:
    • Last variable in CC can be existentially projected without increase in circuit size
    • Give DA for βˆƒxk,…,xnQ(x1,…,xn)\exists x_{k}, \dots, x_n Q(x_1,\dots,x_n).
  2. Semi-ring Aggregation
    • w:XΓ—Dβ†’(𝕂,βŠ•,βŠ—)w : X \times D \rightarrow (\mathbb{K}, \oplus, \otimes)
    • Compute β¨Ο„βˆˆQ(𝔻)⨂x∈Xw(x,Ο„(x))\bigoplus_{\tau \in Q(\mathbb{D})} \bigotimes_{x \in X} w(x, \tau(x))

Work in progress

  1. Improve preprocessing
    OΜƒ(|𝔻|show(Q,Ο€)+1)(||^{show(Q,)+1})
    doable with a few tweaks in DPLL, joint work with S. Salvati.
  2. Lower bounds: preprocessing in |𝔻|show(Q,Ο€)|\mathbb{D}|^{show(Q,\pi)}
    unavoidable under Zero-clique conjecture (join work with N. Carmeli).
  3. Aggregation
    Q(x1,…,xk,F(xk+1,…,xn))Q(x_1,\dots, x_k, F(x_{k+1}, \dots, x_n)), generalizing work by I. Eldar, N. Carmeli, B. Kimelfeld.