October 18, 2023

Join Queries: $Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\vec{z_i})$

where $\vec{z_i}$ tuple over $X = \{x_1,\dots,x_n\}$

**Example**:
$Q(city, country, name, id) = People(id, name, city) \wedge Capital(city, country)$

People | Id | Name | City |
---|---|---|---|

1 | Alice | Paris | |

2 | Bob | Lens | |

3 | Chiara | Roma | |

4 | Djibril | Berlin | |

5 | Émile | Paris | |

6 | Francesca | Roma |

Capital | City | Country |
---|---|---|

Berlin | Germany | |

Paris | France | |

Roma | Italy |

Q | City | Country | Id | Name |
---|---|---|---|---|

Paris | France | 1 | Alice | |

Roma | Italy | 3 | Chiara | |

Berlin | Germany | 4 | Djibril | |

Roma | Italy | 6 | Francesca |

Q | City | Country | Id | Name |
---|---|---|---|---|

Berlin | Germany | 4 | Djibril | |

Paris | France | 1 | Alice | |

Roma | Italy | 3 | Chiara | |

Roma | Italy | 6 | Francesca |

Reorder $Q(\mathbb{D})$ in lexicographical order.

Given
$Q(\vec{x})$
and
$\mathbb{D}$,
simulate **array access** to
$Q(\mathbb{D})$
where
$Q(\mathbb{D})[k]$
is the
$k^{th}$
element of
$Q(\mathbb{D})$
by lexicographical order.

Q | City | Country | Id | Name |
---|---|---|---|---|

Berlin | Germany | 4 | Djibril | |

Paris | France | 1 | Alice | |

Roma | Italy | 3 | Chiara | |

Roma | Italy | 6 | Francesca |

$Q(\mathbb{D})[4]$ is $(Roma, Italy, 6, Francesca)$.

**Preprocessing** expensive:
possibly
$|\mathbb{D}|^{g(|Q|)}$

- Compute $Q(\mathbb{D})$
- Sort the table

**Answering DA queries** Cheap:
$polylog(|\mathbb{D}|)$

- Output $Q(\mathbb{D})[k]$ on input $k$.

Can we have a better preprocessing?

There are classes of queries for which one can decide
$Q(\mathbb{D}) \neq \emptyset$
in **polynomial time** both in
$|\mathbb{D}|$
and
$|Q|$.

- Result generalizes to computing $\#Q(\mathbb{D})$…
- And to
**direct access** - HoF: Bagan, Durand, Olive, Grandjean, Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, Bringmann, Mengel, Schweikardt, Zeevi, Berkholz.

Central class of join queries because of their
**tractability**.

$R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)$

Total of $4$ solutions.

One can use these annotations top-down to solve direct access queries
**on compatible orders**

- for example for $(x,y,z,t,u,v)$.
- $Q(\mathbb{D})[3]$ must set $x=2,y=1,z=0$, then proceed down in the tree.

Given a join query
$Q(x_1,\dots,x_n)$,
if
$x_1,\dots,x_n$
is a **compatible order** then we can answer direct access
queries with precomputation time
$O(|\mathbb{D}|poly(|Q|))$
and access time
$O(poly(|Q|)polylog(|\mathbb{D}|))$.

**Compatible order?**

An $\alpha$-leaf in a hypergraph $H=(V,E)$ is $x \in V$ such that there is $e \in E$, $N(x) \subseteq e$.

A hypergraph $H$ is $\alpha$-acyclic iff one can obtain $\emptyset$ by successively removing $\alpha$-leaves in $H$: induces an order on $V$ called an $\alpha$-elimination order

[Brault Baron, 2014], simplification of [Graham-Yu-Özsoyoglu, 1979], also known as “without disruptive trio” in [Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

Given a join query $Q(x_1,\dots,x_n)$, if $x_n,\dots,x_1$ is a $\alpha$-elimination order then we can answer direct access queries with precomputation time $O(|\mathbb{D}|poly(|Q|))$ and access time $O(poly(|Q|)polylog(|\mathbb{D}|))$.

[Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

Algorithm schema:

- Construct a join tree “compatible” with the $\alpha$-elimination order
- Load data in the join tree, anontate tuples with number of extension.
- Use top-down induction on the annotated join tree to answer DA query $Q(\mathbb{D})[k]$.

**Factorised DB POV**: the trace of Yannakakis Algorithm
**is roughly a decision-DNNF** of size
**linear**.

$x_1$ | $x_2$ | $x_3$ |
---|---|---|

0 | 0 | 0 |

0 | 0 | 1 |

0 | 1 | 0 |

0 | 1 | 1 |

1 | 2 | 0 |

1 | 2 | 1 |

1 | 0 | 2 |

1 | 1 | 2 |

2 | 2 | 2 |

Let $Q$ be a JQ and $x_n, \dots, x_1$ an $\alpha$-elimination order of $Q$.

- $Q(\mathbb{D}) = \biguplus_{d \in D} Q[x_1 = d](\mathbb{D})$
- If $Q = Q_1 \wedge Q_2$ and $var(Q_1) \cap var(Q_2) = \emptyset$ then $Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})$
- Implement this rules recursively +
**cache**already computed queries $Q'$

It produces an ordered circuit computing $Q(\mathbb{D})$ of size $poly(Q) O(|\mathbb{D}|)$

**Precomputation**: Compute for each decision gate
$v$
on
$x$
and value
$d$,
how many solutions of
$v$
we can build by setting
$x$
to
$d' \leq d$.

Find the $7^{th}$ solution.

**Access**: construct the
$k^{th}$
solution one variable at a time.

Find the $13^{th}$ solution.

**Access**: construct the
$k^{th}$
solution one variable at a time.

Given a join query $Q(x_1,\dots,x_n)$, if $x_n,\dots,x_1$ is a $\alpha$-elimination order then we can answer direct access queries with precomputation time $O(|\mathbb{D}|poly(|Q|))$ and access time $O(poly(|Q|)polylog(|\mathbb{D}|))$.

Algorithm schema:

- Construct a “compatible” join tree “compatible”
- Load data in the join tree, anontate tuples.
- Top-down induction to answer DA query $Q(\mathbb{D})[k]$.

- Construct an ordered circuit $C$ for $Q(\mathbb{D})$.
- Anotate the gates with number of solutions.
- Use top-down induction on the circuit to answer DA query $Q(\mathbb{D})[k]$ in time $n \cdot polylog(|\mathbb{D}|)$

Why bother? It looks like the same algorithm on a slightly different data structure.

**It is not**: join tree has an extra underlying “tree” structure, the circuit obtained from this has more structure.There exist relations with

**small**ordered circuit but**big**tree-structured circuits [LICS17].Can we solve more direct access problem with our technique?

Negative Join Queries (NJQ): $Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k \neg R_i(\vec{z_i})$

**Big difference:**- positively encoding $\neg R(\vec{z})$ on domain $D$ need $(D^{|\vec{z}|}-\#R)$ tuples.
- if $Q$ is tractable for every $\mathbb{D}$ then every $Q' \subseteq Q$ is tractable too because we can set $R_i = \emptyset$ in the database, which does not happen in the positive case.

Good candidates for tractability of Negative Join Queries: every $Q' \subseteq Q$ is acyclic.

This is a property known as $\beta$-acyclicity.

**Characterization**:
$H$
is
$\beta$-acyclic
iff there exists a
$\beta$-elimination
order
$x_n, \dots, x_1$,
ie iteratively removing
$\beta$-leaves,
ie,
$x$
such that edges containing
$x$
are ordered by
$\subseteq$:

**Alternatively**: a
$\beta$-elimination
order is an order that is an
$\alpha$-elimination
order for **every subquery**.

Let $Q$ be an NJQ and $x_n, \dots, x_1$ a $\beta$-elimination order for $Q$. Exhaustive DPLL on $Q$, $\mathbb{D}$ and with order $x_1 \dots x_n$ returns an ordered circuit of size $O(poly(|Q|)poly(|\mathbb{D}|))$.

Generalization of [LICS17].

**Corollary**: Direct Access for
$\beta$-acyclic
NJQ with
$O(poly(|\mathbb{D}|))$
preprocessing and access time
$O(polylog(|\mathbb{D}|)poly(|Q|))$
for lexicographical orders based on (reversed)
$\beta$-elimination
orders.

**Side observation**: we know that “join tree” based
techniques fail for
$\beta$-acyclic
NJQ!

Technique generalizes to:

**signed**join queries (mixing positive and negative atoms)**conjunctive**(with $\exists$-quantifiers)*signed*queries:- project $\exists$ directly in the circuits
- as long as one projects a
**suffix**$x_k \dots x_n$

**Non-acyclic***signed*conjunctive queries:- arbitrary elimination order can be associated with a notion of
**“width”** - match (fractional) hypertree width for the positive case
- corresponds to the width of the “worst” subhypergraph of a given order in the negative case

- arbitrary elimination order can be associated with a notion of

- Circuit based tractability of queries is
**powerful**and modular - Deports the heavy lifting to the
**compilation part**, treating the circuit is usually easier than working on annotated join trees. - Future work:
- generalize FAQ/AJAR style queries to JNQ by working directly on the circuit
- Better understanding on the new width measures