Enumeration Complexity

Notation

Let $A(x,y)$ be a set.

Problem $EnumA$ : output $A(x,\_):=\{y \mid A(x,y)\}$ on input $x$ .

$y \in A(x,\_)$ will be called a solution of (the problem) $A$ on input $x$ .

Main Assumption

In this talk, $A$ is an $NP$ -predicate, that is:

$y \in A(x,\_)$ can be tested in polynomial time.
$y$ is of size polynomial in the size of $x$ .

Many natural problems have this property:

Enumerate the answer set of a database query $Q$ on database $\mathbb{D}$
Enumerate the transversals of a hypergraph $\mathcal{H}$

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

$EnumA$ is in $OutputP$ if it can be solved in time polynomial in:

$|x|$
and $|A(x,\_)|$

Delay

Total time is not always satisfactory:

Process solutions in a stream.
Only peek at the solution set.

Delay: the longest time one has to wait between the output of two solutions.

Holy $DelayP$ Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Why do we care?

Guarantees a resonnable waiting time before next solution.
Gives a $t \times delay$ upper bound on the time needed to output $t$ solutions.

(Unpopular?) Opinion: Reason 1 is rarely useful…

Linear Incremental Time

$IncP_1$

Algorithms such that for every $t$ , after $t\cdot d(n)$ steps, the algorithm has output at least $t$ solutions.

We say that $d(n)$ is the incremental delay.

Promoting $IncP_1$

$DelayP$ vs $IncP_1$

Clearly $DelayP \subseteq IncP_1$ : after $delay \times t$ , at least $t$ solutions output.

For the other way around: $2^n$ delay but incremental delay of $2$ .

if $t \leq 2^n$ , $t$ solutions are output in time $t \leq 2t$
if $t = 2^n+1$ , last solution is output at time $2^{n+1} \leq 2t$

(Naive) Regularization

Given an $IncP_1$ -enumerator $A$ with incremental delay $d$ , one can regularize the delay using a queue to delay the output of solutions every $d$ steps:

step = 0
q = Queue()
while(A is not finished):

    move(A)
    step += 1

    if A outputs x:
        q.add(x)

    if step == d:  
        output(q.pull()) 
        step=0 

output(q) # output what remains in q

$IncP_1 = DelayP$

When the simulation of $A$ reaches step $2^n$ :

pushed $2^n$ solutions in the queue
pulled $2^{n}/2$ of them
$2^n/2$ remains…

Natural notion in data structure

Suppose enumeration algorithm $A$ works as follows:

$A$ maintains a dynamic array $T$
$A$ performs insert/deletion on $T$ between two outputs

Delay of $A$ : worst case complexity of insert/delete

$\rightarrow$ $O(|T|)$ if a resize occurs…

but incremental delay of $A$ : amortized complexity of insert/delete

$\rightarrow$ to analyse the time needed to output $k$ solutions, one can consider each operation on $T$ to be $O(1)$ .

Wrap up on $IncP_1$

Incremental delay better than (worst case) delay.
Better analysis tools using amortized complexity of data structure.

If you do not agree, $IncP_1 = DelayP$ anyway…

… but regularizing the delay is expensive in space.

Geometric Amortization

Main contribution

$IncP_1$ $(POLYSPACE)$ $=$ $DelayP$ $(POLYSPACE)$

Regularization using only polynomial space.

Demo first

Meet Coussinet

http://florent.capelli.me/coussinet/

Detailed Statement

For every $IncP_1$ -enumerator $A(x)$ with $N$ solutions:

incremental delay $d$ ,
space $s$ ,
total time $T$

there exists a $DelayP$ -enumerator with

delay $O(d \log(N))$
space $O(s \log(N))$
total time $O(T)$

Geometric Amortization

Maintain $l+1=\lceil \log(N) \rceil$ simulations $A_0, \dots, A_l$ of $A$
$A_i$ outputs solutions for steps in $[2^id;2^{i+1}d[$ .
$A_{i+1}$ “moves faster” than $A_i$ .

Faster how?

$A_i$ moves by at most $2d$ steps.
If $A_i$ finds a solution in its zone, outputs it and proceed back with $A_l$ .
Otherwise, proceed with $A_{i-1}$ .
Stops when $A_0$ is out of its zone.

Key Lemmas

If $A_0, \dots, A_i$ have output $k$ solutions, $A_{i+1}$ has moved by at least $2dk$ steps.
There are at least $2^i$ solutions in $[0; 2^id]$ .

When $A_0$ is finished, $A_i$ has moved by at least $2^{i+1}d$ steps: it has explored all its zone.

Delay

Between two outputs:

each process $A_i$ moves by at most $2d$ steps,
at most $l = \log(N)$ processes,
step counters are incremented / compared (to check whether $A_i$ is in zone $[2^i \cdot d; 2^{i+1}\cdot d]$

Delay: time needed to simulate $l \times 2d$ steps and to work with counters.

Simulating RAMs and Counters

Simulating RAM: With bounds on incremental delay, space and number of solutions of $A$ : $O(1)$ to simulate one step.
Counters: of size at most $\log(dN)$

Gives a $O(d\cdot\log(dN)^2)$ delay (polynomial).

Gray Code based counters + pointer to its most significant bit: delay of $O(d \cdot \log(N))$ .

Improvements

Previous algorithm assumes knowledge of (at least an upper bound):

$N$ , number of solutions of $A(x)$
$s$ , space used by $A(x)$
$d$ , Incremental delay of $A(x)$

Start with a bounded number $A_0, \dots, A_k$ processes
When $A_k$ enters its zone, copy it into $A_{k+1}$
This approach preserves total time.

Encode memory of process with a dedicated dynamic data structure
Tradeoff: $O(1)$ simulation with space $O(s \cdot n^\varepsilon)$ vs $O(\log \log n)$ simulation with space $O(s)$ .

We do not know exactly but…

Unknown incremental delay I

Goal:

Construct $REG$ , taking as input a black box oracle access to an enumerator $A$ ; the incremental delay $d_A$ is unknown.
$REG(A)$ outputs the same solutions as $A$ with delay $\alpha \cdot d_A$ .
Can $\alpha$ be independent on $A$ (constant)?

It is provably impossible:

Use an adversarial oracle $A$ that outpus $\#A-1$ solutions first and hold on the last solution until $REG(A)$ outputs $\#A-1$ solutions.

Unknown incremental delay II

For every $\varepsilon$ , one can construct $REG$ such that $REG(A)$ has delay $d_A^{1+\varepsilon}$ .

Naive regularization and pull a solution from the queue if no solution has been output since $C_\varepsilon \cdot \tilde{d}$

where

$\tilde{d} = {steps \cdot (Nsol+1)^{-1}}$ approximate locally the incremental delay ( $Nsol$ solutions seen after $steps$ )
$C_\varepsilon = {(1-2^\varepsilon)}^{-1}$

Open Problem: make this tradeoff work with geometric amortization.

Generalizations and Applications

Incremental time

The approach generalizes to collapse the following classes:

Usual $k$ -incremental time: the delay between solution $i$ and $i+1$ is $O(poly(n) i^k)$ .
Relaxed $(k+1)$ -incremental time: for every $i$ , at least $i$ solutions have been output after time $poly(n)i^{k+1}$ .

Change budget given to each process to $2S^k(k+1)d$ where $S$ is the number of solutions output so far.

Trading average delay with worst case delay

$E(x,\_)$ a self reducible problem :

$E(x, \_) = \bigcup_{i} E(x_i, \_)$ with $|x_i| < |x|$
Branch and bound enumeration algorithm AA for EE
- Polynomial delay
- Better average delay when branches have a lot of solutions

Average delay is valid in every branches:

it is an incremental delay,
can be trade for a worst case delay using geometric amortization

Example: enumerating DNF models

Problem:

Input: a DNF $D = \bigvee_{i \leq m}\ S_i$ on variables $x_1,\dots,x_n$
Output: every satisfying assignments of $x_1,\dots,x_n$ of $D$ .

Branch and bound algorithm: extension problem efficiently solvable.

If $D[x_1=0]$ has a solution, recursively enumerate $D[x_1=0]$ .
If $D[x_1=1]$ has a solution, recursively enumerate $D[x_1=1]$ .
Linear delay, in $|D|$ (if implemented correctly).

DNF enumeration in strong polynomial delay

Open question: can we solve it with a delay that is polynomial only in the size of the solutions, ie, $n$ .

Two conjectures:

Strong DNF Enumeration conjecture: impossible with a delay with a $o(m)$ dependency. Refuted using Geometric Amortization
DNF Enumeration conjecture: impossible with a delay $poly(n)$ .

DNF enumeration: $O(n^2 m^\mu)$ delay

Branch and bound algorithm has:

average delay $O(n^2 \cdot m^\mu)$ with $\mu = 1-\log_2(3)<1$ ,
incremental delay $O(n^2 \cdot m^\mu)$ with $\mu = 1-\log_2(3)<1$ ,
Geometric amortization gives worst case delay $O(poly(n) \cdot m^\mu)$

Conclusion

New method to make delay regular without exponential space
Message: incremental delay may be more natural than worst case delay.

Open problem: can we regularize without changing output order?

Enumeration Complexity

Notation

Main Assumption

Complexity

Total Time

Delay

Holy DelayPDelayP Grail

Linear Incremental Time

Promoting IncP1IncP_1

DelayPDelayP vs IncP1IncP_1

(Naive) Regularization

IncP1=DelayPIncP_1 = DelayP

Natural notion in data structure

Related work 1: Goldberg’s Thesis

Related work 2: Any-k Algorithms

Wrap up on IncP1IncP_1

Geometric Amortization

Main contribution

Demo first

Detailed Statement

Geometric Amortization

Faster how?

Key Lemmas

Delay

Simulating RAMs and Counters

Improvements

Unknown incremental delay I

Unknown incremental delay II

Generalizations and Applications

Incremental time

Trading average delay with worst case delay

Example: enumerating DNF models

DNF enumeration in strong polynomial delay

DNF enumeration: O(n2mμ)O(n^2 m^\mu) delay

Conclusion

Holy $DelayP$ Grail

Promoting $IncP_1$

$DelayP$ vs $IncP_1$

$IncP_1 = DelayP$

Wrap up on $IncP_1$

DNF enumeration: $O(n^2 m^\mu)$ delay