Linear Programs with Relations

Florent Capelli, Nicolas Crosetti, Joachim Niehren, Jan Ramon

Université de Lille, Inria Lille

Overview

  • Introduction of a modelling language allowing the definition of linear programs over the solutions of a conjunctive query.
  • We leverage known tractability results on conjunctive queries (fractional hypertree width, factorized databases) to this setting.

Most results presented here can be found in our ICDT'22 paper and will be part of Nicolas Crosetti's thesis (writing in progress).

Managing employees and projects

Project pid skill time
1 Python 45
2 C 10
3 SQL 20
Employees eid skill
Alice Python
Alice C
Bob SQL
Bob Python

$Q = project(p, s, t) \wedge employee(e, s)$

Managing employees time

Q p s e t Assigned Time
1 Python Alice 45 $\theta_{1}$
1 Python Bob 45 $\theta_{2}$
2 C Alice 10 $\theta_{3}$
3 SQL Bob 20 $\theta_{4}$
  • Minimize total time spent by employees
  • Make sure no employee is overworked
  • Make sure each project is fulfilled
minimize$\theta_{1} + \theta_{2} + \theta_{3} + \theta_{4}$
subject to$\theta_{1} + \theta_{3} \leq 40$
$\theta_{2} + \theta_{4} \leq 40$
$\theta_{1} + \theta_{2} \geq 45$
$\theta_{3} \geq 10$
$\theta_{4} \geq 20$

Modeling the problem

minimize$weight(Q)$
subject to $weight_{e=Alice}(Q) \leq 40$
$weight_{e=Bob}(Q) \leq 40$
$weight_{p=1}(Q) \geq 45$
$weight_{p=2}(Q) \geq 10$
$weight_{p=3}(Q) \geq 20$

Modeling the problem (cont.)

minimize$weight(Q)$
subject to $\forall (e') : \exists s' . Employee(e', s') . weight_{e = e'}(Q) \leq 40$
$\forall (p', s', t') : Project(p', s', t') . weight_{p = p'}(Q) \geq num(t')$

Contribution: LP(CQ)

A language based on standard linear programming constructs.

  • Give weights to select subsets of query answers
  • Universally quantify constraints or sums
  • Retrieve constants from the database


Constant numbers $c \mid$ $num(E)$
$LP(CQ)$ expressions $N S \mid S+S' \mid N \mid$ $weight_{\mathbf{x}: Q'}(Q)$ $\mid$ $\sum_{\mathbf{x}: Q} S$
$LP(CQ)$ constraints $S \le S' \mid C \wedge C' \mid true \mid$ $\forall \mathbf{x} : Q. C $
$LP(CQ)$ programs maximize $S$ subject to $C$

Naive solving algorithm

  1. Enumerate the answers of the query
  2. Interpret the LP(CQ) model as a standard linear program
    1. Unfold the quantifiers
    2. Interpret $weight$ expressions
  3. Use a LP solver to find an optimal solution
    • $\theta_1=25, \theta_2=20, \theta_3=10, \theta_4=20$
    • Objective: $\theta_1+\theta_2+\theta_3+\theta_4=75$

Limitation of the naive algorithm

Let $L$ be an LP(CQ) using a query $Q$.

  • Interpretation of $L$ on a database $\mathbb{D}$ gives a program having up to $|Q(\mathbb{D})|$ variables.
  • Linear Program Solvers are very sensible to the number $n$ of variables.
  • The size of $Q(\mathbb{D})$ may be orders of magnitude bigger than the size of $\mathbb{D}$.

In the previous example:

  • $|Q(\mathbb{D})|$ may be of size $|Employee|\times|Project|$.
  • While $|\mathbb{D}| = |Employee| + |Project|$

Can we do better?

(Combined) Complexity of solving LP(CQ)s

Let $L$ be the following LP(CQ):

maximize$weight(Q)$
subject to $0 \leq weight(Q) \leq 1$

Given a query $Q$ and database $\mathbb{D}$, can we decide whether the optimal value of $L$ is different from 0?

This problem is NP-hard by reduction to the satisfiability of $Q$

Tractable conjunctive queries

We leverage known tractability results on conjunctive queries (Yannakakis' algorithm, fractional hypertree width) to our setting:

Main Theorem: Given a LP(CQ) $L$ on a set of queries of fractional hypertree width at most $k$* we can build a linear program that only has $O(|\mathbb{D}|^k)$ variables and is equivalent to the naive interpretation.

*$k$ also depends on the linear program

A tempting approach

Idea: Treat the $weight$ expressions as independent variables.

minimize$X_1$
subject to $X_2 \leq 40$
$X_3 \leq 40$
$X_4 \geq 45$
$X_5 \geq 10$
$X_6 \geq 20$
$X_1 = X_2+X_3$
$X_1 = X_4+X_5+X_6$

Problem:

  • This linear program is unbounded.
  • The value of $X_1$ is not idependent from the value of $X_2$ in the original program

A tractable fragment of LP(CQ)

  • The queries in the $\forall \mathbf{x} : Q. C $ and $\sum_{\mathbf{x}: Q} S$ quantifiers must be single atoms
  • Each weight must select answers using variables that are covered by a single bag of the tree decomposition of the query

Factorized interpretation

minimize$X_1$
subject to $X_2 \leq 40$
$X_3 \leq 40$
$X_4 \geq 45$
$X_5 \geq 10$
$X_6 \geq 20$
(e, es)$X_2 = X_7 + X_8$
$X_3 = X_9 + X_{10}$
(∅, es)$X_1 = X_7 + X_8 + X_9 + X_{10}$
...

Preliminary practical results

Naive vs Factorized results

Solving time of GLPK for naive and factorized interpretation of the resource delivery problem with respect to table size.

LP on relations and factorized databases

Constant numbers $c \mid$ $num(E)$
$LP$ expressions $N S \mid S+S' \mid N \mid$ $weight_{\mathbf{x}: Q'}(Q)$ $\mid$ $\sum_{\mathbf{x}: Q} S$
$LP$ constraints $S \le S' \mid C \wedge C' \mid true \mid$ $\forall \mathbf{x} : Q. C $
$LP$ programs maximize $S$ subject to $C$
  • $Q$ can be any query, or any relation signature with an interpretation $Q(\mathbb{D})$ on a database $\mathbb{D}$
  • Our tractability result generalizes if $Q(\mathbb{D})$ is given as a factorized database (and a bit more).

Conclusion

Contributions

  • A language to define linear programs with conjunctive queries
  • A tractable fragment for LP(CQ)

Future work

  • Programs with convex objectives/constraints
  • Programs with different data representations
  • Practical tool to solve LP(CQ)s
  • Testing the implementation on real data