Interpretation of $L$ on a database $\mathbb{D}$ gives a program having up to $|Q(\mathbb{D})|$ variables.
Linear Program Solvers are very sensible to the number $n$ of variables.
The size of $Q(\mathbb{D})$ may be orders of magnitude bigger than the size of $\mathbb{D}$.
In the previous example:
$|Q(\mathbb{D})|$ may be of size $|Employee|\times|Project|$.
While $|\mathbb{D}| = |Employee| + |Project|$
Can we do better?
(Combined) Complexity of solving LP(CQ)s
Let $L$ be the following LP(CQ):
maximize
$weight(Q)$
subject to
$0 \leq weight(Q) \leq 1$
Given a query $Q$ and database $\mathbb{D}$, can we decide
whether the optimal value of $L$ is different from 0?
This problem is NP-hard by reduction to the satisfiability of $Q$
Tractable conjunctive queries
We leverage known tractability results on conjunctive queries (Yannakakis' algorithm, fractional hypertree width) to our setting:
Main Theorem: Given a LP(CQ) $L$ on a set of queries of fractional hypertree width at most $k$*
we can build a linear program that only has $O(|\mathbb{D}|^k)$ variables and is
equivalent to the naive interpretation.
*$k$ also depends on the linear program
A tempting approach
Idea: Treat the $weight$ expressions as
independent variables.
minimize
$X_1$
subject to
$X_2 \leq 40$
$X_3 \leq 40$
$X_4 \geq 45$
$X_5 \geq 10$
$X_6 \geq 20$
$X_1 = X_2+X_3$
$X_1 = X_4+X_5+X_6$
Problem:
This linear program is unbounded.
The value of $X_1$ is not idependent from the value of $X_2$ in the original program
A tractable fragment of LP(CQ)
The queries in the $\forall \mathbf{x} : Q. C $ and
$\sum_{\mathbf{x}: Q} S$ quantifiers must be single atoms
Each weight must select answers using variables that are covered by a single bag
of the tree decomposition of the query
Factorized interpretation
minimize
$X_1$
subject to
$X_2 \leq 40$
$X_3 \leq 40$
$X_4 \geq 45$
$X_5 \geq 10$
$X_6 \geq 20$
(e, es)
$X_2 = X_7 + X_8$
$X_3 = X_9 + X_{10}$
(∅, es)
$X_1 = X_7 + X_8 + X_9 + X_{10}$
...
Preliminary practical results
Solving time of GLPK for naive
and factorized interpretation
of the resource delivery problem with respect to table size.
LP on relations and factorized databases
Constant numbers
$c \mid$ $num(E)$
$LP$ expressions
$N S \mid S+S' \mid N \mid$
$weight_{\mathbf{x}: Q'}(Q)$ $\mid$ $\sum_{\mathbf{x}: Q} S$
$LP$ constraints
$S \le S' \mid C \wedge C' \mid true \mid$
$\forall \mathbf{x} : Q. C $
$LP$ programs
maximize $S$ subject to $C$
$Q$ can be any query, or any relation signature with an interpretation $Q(\mathbb{D})$ on a database $\mathbb{D}$
Our tractability result generalizes if $Q(\mathbb{D})$ is given as a factorized database (and a bit more).
Conclusion
Contributions
A language to define linear programs with conjunctive queries