A Simple Algorithm for Worst Case Optimal Join and Sampling

Florent Capelli, Oliver Irwin, Sylvain Salvati

CRIL, Université d’Artois

Journées du GT DAAL 2025

13 May 2025

Joining relations

Joining like a pro

QR(x1,x2)S(x1,x3)T(x2,x3)Q \coloneq R(x_1, x_2) \wedge S(x_1, x_3) \wedge T(x_2, x_3)

RR x1x_1 x2x_2
0 0
0 1
2 1
SS x1x_1 x3x_3
0 0
0 2
2 3
TT x2x_2 x3x_3
0 2
1 0
1 2
  • Devise a query plan: (RS)(R \bowtie S) T\bowtie T
  • Materialize the intermediate joins.
RSR \bowtie S T\bowtie T x1x_1 x2x_2 x3x_3
0 0 0
0 0 2
0 1 0
0 1 2
2 1 3

Joining like a brute

QR(x1,x2)S(x1,x3)T(x2,x3)Q \coloneq R(x_1, x_2) \wedge S(x_1, x_3) \wedge T(x_2, x_3)

RR x1x_1 x2x_2
0 0
0 1
2 1
SS x1x_1 x3x_3
0 0
0 2
2 3
TT x2x_2 x3x_3
0 2
1 0
1 2

Disruptive poll

In theory, is it better to join like:

  1. A pro
  2. A brute

No confidence vote

  1. Go home, you are not qualified to talk about joins after saying dumb things like that.
  2. We are all theorist here, please tell us the whole story

What is wrong with joining like a pro

QR(x1,x2)S(x1,x3)T(x2,x3)Q \coloneq R(x_1, x_2) \wedge S(x_1, x_3) \wedge T(x_2, x_3)

Worst scenario for query plans

Consider 𝔻\mathbb{D} on domain D=D1D2D3D = D_1 \uplus D_2 \uplus D_3 with:

RR x1x_1 x2x_2
0 D2D_2
D1D_1 0
SS x1x_1 x3x_3
0 D3D_3
D1D_1 0
TT x2x_2 x3x_3
0 D3D_3
D2D_2 0

Every query plan will materialize a table of size O(N2)O(N^2) but the answer table will never be of size greater than (2N)1.5(2N)^{1.5}.

And the brute?

Domain D=D1D2D3D = D_1 \uplus D_2 \uplus D_3 with 0D0 \notin D and |D1|=|D2|=|D3|=N|D_1|=|D_2|=|D_3|=N.

RR x1x_1 x2x_2
0 D2D_2
D1D_1 0
SS x1x_1 x3x_3
0 D3D_3
D1D_1 0
TT x2x_2 x3x_3
0 D3D_3
D2D_2 0

If we do the “else” branches efficiently (e.g. by reading values from one table), the algorithm makes O(N)O(N) recursive calls.

Worst-case optimality

Worst-case optimal join

QR(x1,x2)S(x1,x3)T(x2,x3)Q \coloneq R(x_1, x_2) \wedge S(x_1, x_3) \wedge T(x_2, x_3)

Ideal complexity: output Q(𝔻)Q(\mathbb{D}) in time O(f(|Q|)|Q(𝔻)|)O(f(|Q|) \cdot |Q(\mathbb{D})|)

… unlikely to be possible.

Worst case optimal: output Q(𝔻)Q(\mathbb{D}) in time Õ(f(|Q|)N1.5)\tilde{O}(f(|Q|) \cdot N^{1.5}).

NN is the size of the largest input relation and Õ()\tilde{O}(\cdot) ignores polylog factors.

f(|Q|)f(|Q|): data complexity, ie, QQ is considered constant. Ideally, ff is a reasonable polynomial though.

Worst case value

Consider a join query QQ and all databases for QQ with a bound NN on the table size: 𝒟QN={𝔻RQ,|R𝔻|N} \mathcal{D}_{Q}^{\leqslant N}= \{\mathbb{D}\mid \forall R \in Q, |R^\mathbb{D}| \leqslant N\} and let: 𝗐𝖼(Q,N)=𝗌𝗎𝗉𝔻𝒟QN|Q(𝔻)| \mathsf{wc}(Q, N) = \mathsf{sup}_{\mathbb{D}\in\mathcal{D}_{Q}^{\leqslant N}}~|Q(\mathbb{D})|

𝗐𝖼(Q,N)\mathsf{wc}(Q,N) is the worst case: the size of the biggest answer set possible with query QQ and databases where each table are bounded by NN.

Worst case examples

We know how to compute ρ(Q)\rho(Q) such that 𝗐𝖼(Q,N)=Õ(Nρ(Q))\mathsf{wc}(Q,N) = \tilde{O}(N^{\rho(Q)}) but we do not need it!

This is known as the AGM-bound

Worst case optimal join (WCOJ) algorithms

A join algorithm is worst case optimal (wrt 𝒟QN\mathcal{D}_{Q}^{\leqslant N}) if for every QQ, NN \in \mathbb{N} and 𝔻𝒟QN\mathbb{D}\in \mathcal{D}_{Q}^{\leqslant N}, it computes Q(𝔻)Q(\mathbb{D}) in time Õ(f(|Q|)𝗐𝖼(Q,N))\tilde{O}(f(|Q|) \cdot \mathsf{wc}(Q,N))

The DBMS approach is not worst case optimal (triangle example from before).

Existing WCOJ Algorithm

Rich literature:

We prove the worst case optimality of the branch and bound algorithm in an elementary way.

Analysing the brute

Algorithm reminder

Complexity analysis

One recursive call:

  • branch variable xix_i on value d𝖽𝗈𝗆d \in \mathsf{dom}
  • filter/project relations with xix_i
  • Binary search in O(log|R|)O(\log |R|) if RR ordered
    (O(1)O(1) possible using tries).
RR x1x_1 x2x_2
0 0
0 2
1 0
1 1
2 0
2 1

Total complexity: number of recursive calls times Õ(m)\tilde{O}(m) where mm is the number of atoms.

Number of calls: example

  • Nodes: partial assignment τ\tau
  • Here: τ:={x1=0,x2=1}\tau := \{x_1=0, x_2=1\}
  • Not node: partial assignment compatible with every relation
  • τ\tau solution of Q2(𝔻2)Q_2(\mathbb{D}_2): project on x1,x2x_1,x_2.
  • At most: |Q2(𝔻2)||Q_2(\mathbb{D}_2)| such nodes at level 33

QR(x1,x2)S(x1,x3)T(x2,x3)Q \coloneq R(x_1, x_2) \wedge S(x_1, x_3) \wedge T(x_2, x_3)

RR x1x_1 x2x_2
0 0
0 1
2 1
SS x1x_1 x3x_3
0 0
0 2
2 3
TT x2x_2 x3x_3
0 2
1 0
1 2

Q2R2(x1,x2)S2(x1)T2(x2)Q_2 \coloneq R_2(x_1, x_2) \wedge S_2(x_1) \wedge T_2(x_2)

R2R_2 x1x_1 x2x_2
0 0
0 1
2 1
S2S_2 x1x_1
0
2
T2T_2 x2x_2
0
1

Number of calls in general

  • a call = a node = a partial assignment.
  • τ:=x1=d1,,xi=di\tau := x_1=d_1, \dots, x_i=d_i current call, not :
    • No inconsistency.
    • R𝔻[τ]R^\mathbb{D}[\tau] not empty for each RQR \in Q
    • τQi(𝔻)\tau \in Q_i(\mathbb{D}) for Qi=RQx1xiRQ_i = \bigwedge_{R\in Q}\prod_{x_1\dots x_i} R
    • i=1n|Qi(𝔻)|\leq \sum_{i=1}^n |Q_i(\mathbb{D})| such nodes!
  • τ:=x1=d1,,xi+1=di+1\tau := x_1=d_1, \dots, x_{i+1}=d_{i+1} current call is :
    • x1=d1,,xi=dix_1=d_1, \dots, x_{i}=d_i is not \bot.
    • |𝖽𝗈𝗆|i=1n|Qi(𝔻)|\leq |\mathsf{dom}| \cdot \sum_{i=1}^n |Q_i(\mathbb{D})| \bot-nodes!

At most (|𝖽𝗈𝗆|+1)i=1n|Qi(𝔻)|(|\mathsf{dom}|+1) \sum_{i=1}^n |Q_i(\mathbb{D})| calls.

Complexity: Õ(m|𝖽𝗈𝗆|i=1n|Qi(𝔻)|)\tilde{O}(m|\mathsf{dom}|\cdot \sum_{i=1}^{n}|Q_i(\mathbb{D})|).

Toward worst case optimality

|Qi(𝔻)|=|RQx1xiR𝔻|=|RQR𝔻|=|Q(𝔻)|\begin{align*} |Q_i(\mathbb{D})| &= |\bigwedge_{R\in Q}\prod_{x_1\dots x_i} R^\mathbb{D}|\\ &= |\bigwedge_{R\in Q} R^{\mathbb{D}'}| \\ &= |Q(\mathbb{D}')| \end{align*}

where R𝔻=x1xiR𝔻×{0}XR\x1,,xiR^{\mathbb{D}'} = \prod_{x_1\dots x_i} R^\mathbb{D}\times \{0\}^{X_R \setminus x_1, \dots, x_i}

Crucial observation:

  • |R𝔻|=|x1xiR𝔻||R𝔻|N|R^{\mathbb{D}'}| = |\prod_{x_1\dots x_i} R^\mathbb{D}| \leq |R^\mathbb{D}| \leq N
  • Hence 𝔻𝒟QN\mathbb{D}' \in \mathcal{D}_{Q}^{\leqslant N}.
  • |Qi(𝔻)|=|Q(𝔻)|𝗐𝖼(Q,N)|Q_i(\mathbb{D})| = |Q(\mathbb{D}')| \leq \mathsf{wc}(Q,N)

𝔻\mathbb{D}

RR x1x_1 x2x_2
0 0
0 1
2 1
SS x1x_1 x3x_3
0 0
0 2
2 3
TT x2x_2 x3x_3
0 2
1 0
1 2

𝔻2\mathbb{D}_2

R2R_2 x1x_1 x2x_2
0 0
0 1
2 1
S2S_2 x1x_1
0
2
T2T_2 x2x_2
0
1

𝔻\mathbb{D}'

RR' x1x_1 x2x_2
0 0
0 1
2 1
SS' x1x_1 x3x_3
0 0
2 0
TT' x2x_2 x3x_3
0 0
1 0

Branch and bound complexity

|Qi(𝔻)|wc(Q,N) |Q_i(\mathbb{D})| \leq wc(Q,N)

The complexity of the branch and bound algorithm is

Õ(m|𝖽𝗈𝗆|i=1n|Qi(𝔻)|)\tilde{O}(m|\mathsf{dom}|\cdot \sum_{i=1}^n|Q_i(\mathbb{D})|)

Õ(m|𝖽𝗈𝗆|n𝗐𝖼(Q,N))\tilde{O}(m|\mathsf{dom}| \cdot n\mathsf{wc}(Q, N))

Õ(mn|𝖽𝗈𝗆|𝗐𝖼(Q,N))\tilde{O}(mn \cdot |\mathsf{dom}| \cdot \mathsf{wc}(Q, N))

Õ(mn\tilde{O}(mn \cdot|𝖽𝗈𝗆|{|\mathsf{dom}|}𝗐𝖼(Q,N))\cdot \mathsf{wc}(Q, N))

We do not even need to know 𝗐𝖼(Q,N)\mathsf{wc}(Q,N) to prove it!

Make the domain binary!

RR xx yy
1 2
2 1
3 0

R̃b\tilde{R}^b x2x^2 x1x^1 x0x^0 y2y^2 y1y^1 y0y^0
0 0 1 0 1 0
0 1 0 0 0 1
0 1 1 0 0 0

WCOJ finally

We hence compute Q(𝔻)Q(\mathbb{D}) in time Õ(mn𝗐𝖼(Q,N))\tilde{O}(m n \cdot \mathsf{wc}(Q, N))!

More on Binarization

Sampling answers uniformly

Problem statement

Given QQ and 𝔻\mathbb{D}, sample τQ(𝔻)\tau \in Q(\mathbb{D}) with probability 1|Q(𝔻)|\frac{1}{|Q(\mathbb{D})|} or fail if Q(𝔻)=Q(\mathbb{D}) = \emptyset.

Naive algorithm:

Complexity using WCOJ: Õ(𝗐𝖼(Q,N)poly(|Q|))\tilde{O}(\mathsf{wc}(Q,N) poly(|Q|)).

We can do better: (expected) time Õ(𝗐𝖼(Q,N)|Q(𝔻)|+1poly(|Q|))\tilde{O}(\frac{\mathsf{wc}(Q,N)}{|Q(\mathbb{D})|+1} poly(|Q|))

PODS 23: [Deng, Lu, Tao] and [Kim, Ha, Fletcher, Han]

Let’s do a modular proof of this fact!

Revisiting the problem

Sampling answers reduces to sampling -leaves in a tree with (,)-labeled leaves.

Sampling leaves, the easy way

  • (t)\ell(t): number of -leaves below tt is known
  • Recursively sample uniformly a \top-leaf in tit_i with probability (ti)(t)\frac{\ell(t_i)}{\ell(t)}.
  • A leaf in (ti)\ell(t_i) will hence be sampled with probability 1(ti)×(ti)(t)=1(t)\frac{1}{\ell(t_i)} \times \frac{\ell(t_i)}{\ell(t)} = \frac{1}{\ell(t)} Uniform!

In our case, we do not know (t)\ell(t)

Sampling leaves with a nice oracle

  • upb(t)upb(t): upperbound on the number of -leaves below tt is known
  • Recursively sample uniformly a \top-leaf in tit_i with probability upb(ti)upb(t)\frac{upb(t_i)}{upb(t)}.
  • Fail with probability 1iupb(ti)upb(t)1 - \sum_i \frac{upb(t_i)}{upb(t)} or upon encountering .

Only makes sense if iupb(ti)upb(t)\sum_i upb(t_i) \leq upb(t).

Las Vegas uniform sampling algorithm:

  • each leaf is output with probability 1ubp(t)\frac{1}{ubp(t)},
  • fails with proba 1(t)upb(t)1 - \frac{\ell(t)}{upb(t)} where (t)\ell(t) is the number of -leaves under tt.

Repeat until output: O(upb(r)(r))O(\frac{upb(r)}{\ell(r)}) expected calls, where rr is the root.

Upper bound oracles for conjunctive queries

  • Node tt: partial assignment τt:=(x1=d1,,xi=di)\tau_t := (x_1=d_1, \dots, x_i=d_i)
  • Number of leaves below tt: |Q(𝔻)[τt]||Q(\mathbb{D})[\tau_t]|.
  • upb(t)???upb(t) ???: look for worst case bounds!

AGM bound: there exists positive rational numbers (λR)RQ(\lambda_R)_{R \in Q} such that |Q(𝔻)|RQ|R𝔻|λR𝗐𝖼(Q,N)|Q(\mathbb{D})| \leq \prod_{R \in Q}|R^\mathbb{D}|^{\lambda_R} \leq \mathsf{wc}(Q,N)

Define upb(t)=RQ|R𝔻[τt]|λR𝗐𝖼(Q,N)upb(t) = \prod_{R \in Q}|{\color{red}R^\mathbb{D}[\tau_t]}|^{\lambda_R} \leq \mathsf{wc}(Q,N):

  • it is an upper bound on |Q(𝔻)[τt]||Q(\mathbb{D})[\tau_t]|,
  • it is supperadditive: upb(t)d𝖽𝗈𝗆upb(td)upb(t) \geq \sum_{d \in \mathsf{dom}} upb(t_d)
  • value of upbupb at the root of the tree: 𝗐𝖼(Q,N)\mathsf{wc}(Q,N)!

Wrapping up sampling

Given a super-additive function upperbounding the number of -leaves in a tree at each node, we have:

Las Vegas uniform sampling algorithm:

Repeat until output: O(upb(r)(r))O(\frac{upb(r)}{\ell(r)})=𝗐𝖼(Q,N)1+|Q(𝔻)|=\frac{\mathsf{wc}(Q,N)}{1+|Q(\mathbb{D})|} expected calls.

Final complexity: binarize to navigate the tree in Õ(nm)\tilde{O}(nm): Õ(nm𝗐𝖼(Q,N)1+|Q(𝔻)|)\tilde{O}(nm \cdot \frac{\mathsf{wc}(Q,N)}{1+|Q(\mathbb{D})|})

Matches existing results, proof more modular.

Beyond Cardinality Constraints

Worst case and constraints

So far we have considered worst case wrt this class:

Each relation is subject to a cardinality constraint of size NN.

What if we know that our instance has some extra properties (e.g., a functional dependency)

In this case, we say that our algorithm is worst case optimal wrt 𝒞\mathcal{C}.

Finer constraints can help

Q=R(x1,x2)S(x2,x3)Q = R(x_1,x_2) \wedge S(x_2,x_3).

We have: 𝗐𝖼(Q,N)=N2\mathsf{wc}(Q,N) = N^2.

  • Let 𝒞\mathcal{C} be the class of databases where |R|N,|S|N|R| \leq N, |S| \leq N and RR respect functional dependency x2x1x_2 \rightarrow x_1.
  • 𝗐𝖼(Q,𝒞)N\mathsf{wc}(Q,\mathcal{C}) \leq N because each tuple of S𝔻S^\mathbb{D} can be extended to at most one solution.

Is our simple join worst case optimal for this class?

Short answer: yes if x2x_2 is set before x1x_1.

Prefix closed classes

Recall the complexity of our algorithm: Õ(m|𝖽𝗈𝗆|i=1n|Qi(𝔻)|))\tilde{O}(m |\mathsf{dom}| \sum_{i=1}^n |Q_i(\mathbb{D})|)) where Qi=RQx1,,xiRQ_i = \bigwedge_{R \in Q} \prod_{x_1,\dots, x_i} R

A class of database 𝒞\mathcal{C} for QQ is prefix closed for order π=(x1,,xn)\pi = (x_1,\dots,x_n) if for each ii and 𝔻𝒞\mathbb{D}\in \mathcal{C}:

|Qi(𝔻)|𝗐𝖼(𝒞) |Q_i(\mathbb{D})| \leq \mathsf{wc}(\mathcal{C})

𝒟QN\mathcal{D}_{Q}^{\leqslant N} is prefix closed (for any order)!

Our algorithm is (almost) worst case optimal as long as we use an order for which 𝒞\mathcal{C} is prefix closed!

Acyclic functional dependencies

F=(X1Y1,,XkYk)F = (X_1 \rightarrow Y_1, \dots, X_k \rightarrow Y_k) is a set of functional dependencies:

𝒞FN={𝔻𝔻 respects F}𝒟QN\mathcal{C}_F^N = \{\mathbb{D}\mid \mathbb{D}\text{ respects $F$} \} \cap \mathcal{D}_{Q}^{\leqslant N}

is prefix closed for order π\pi (exactly the same proof as for cardinality constraints).

Hence our algorithm is worst case optimal wrt 𝒞FN\mathcal{C}_F^N (as long as we follow π\pi).

We need to show that this functional dependencies transfer in the binarised setting but it is almost immediate.

Degree constraints

A degree constraint is a constraint (X,Y,NY|X)(X,Y,N_{Y|X}) where XYX \subseteq Y. A relation RR verifies the constraint if

max{|YR[τ]|,τ𝖽𝗈𝗆X}NY|X \max \{ |\prod_{Y} R[\tau]|, \tau \in \mathsf{dom}^X\} \leq N_{Y|X}

Acyclic degree constraints

Δ={(X1,Y1,N1),(Xk,Yk,Nk)}\Delta = \{(X_1, Y_1, N_{1}) \dots, (X_k, Y_k, N_k)\} set of degree constraints.

𝒞ΔN={𝔻𝔻 respects Δ}𝒟QN\mathcal{C}_\Delta^N = \{\mathbb{D}\mid \mathbb{D}\text{ respects $\Delta$} \} \cap \mathcal{D}_{Q}^{\leqslant N}

is prefix closed for order π\pi (exactly the same proof as for cardinality constraints).

Hence our algorithm is worst case optimal wrt 𝒞ΔN\mathcal{C}_\Delta^N (as long as we follow π\pi).

We need to show that this functional dependencies transfer in the binarised setting but it is almost immediate.

Bonus: sampling acyclic degree constraints

We can find (λR)(\lambda_R) such that RQ|R𝔻|λRÕ(𝗐𝖼(Q,𝒞ΔN))\prod_{R \in Q} |R^\mathbb{D}|^{\lambda_R} \leq \tilde{O}(\mathsf{wc}(Q, \mathcal{C}_\Delta^N)) for any 𝔻𝒞ΔN\mathbb{D}\in \mathcal{C}_\Delta^N (polymatroid bound).

Define upb(t):=RQ|R𝔻[τt]|λRupb(t) := \prod_{R \in Q} |R^\mathbb{D}[\tau_t]|^{\lambda_R}:

We have sampling with complexity Õ(nm𝗐𝖼(Q,𝒞ΔN)1+|Q(𝔻)|)\tilde{O}(nm \cdot \frac{\mathsf{wc}(Q,\mathcal{C}_\Delta^N)}{1+|Q(\mathbb{D})|})

Conclusion

Future work: