Geometric Amortization of Enumeration Algorithms

true

true

Novembre 24, 2022

Enumeration Complexity

Notation

Let A(x,y)A(x,y) be a set.

Problem EnumAEnumA: output A(x,_):={yA(x,y)}A(x,\_):=\{y \mid A(x,y)\} on input xx.

yA(x,_)y \in A(x,\_) will be called a solution of (the problem) AA on input xx.

Main Assumption

In this talk, AA is an NPNP-predicate, that is:

Many natural problems have this property:

  • Enumerate the answer set of a database query QQ on database 𝔻\mathbb{D}
  • Enumerate the transversals of a hypergraph \mathcal{H}

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumAEnumA is in OutputPOutputP if it can be solved in time polynomial in:

Delay

Total time is not always satisfactory:

Delay: the longest time one has to wait between the output of two solutions.

Holy DelayPDelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Why do we care?

  1. Guarantees a resonnable waiting time before next solution.
  2. Gives a t×delayt \times delay upper bound on the time needed to output tt solutions.

(Unpopular?) Opinion: Reason 1 is rarely useful…

Linear Incremental Time

IncP1IncP_1

Algorithms such that for every tt, after td(n)t\cdot d(n) steps, the algorithm has output at least tt solutions.

We say that d(n)d(n) is the incremental delay.

Promoting IncP1IncP_1

DelayPDelayP vs IncP1IncP_1

Clearly DelayPIncP1DelayP \subseteq IncP_1: after delay×tdelay \times t, at least tt solutions output.

For the other way around: 2n2^n delay but incremental delay of 22.

  • if t2nt \leq 2^n, tt solutions are output in time t2tt \leq 2t
  • if t=2n+1t = 2^n+1, last solution is output at time 2n+12t2^{n+1} \leq 2t

(Naive) Regularization

Given an IncP1IncP_1-enumerator AA with incremental delay dd, one can regularize the delay using a queue to delay the output of solutions every dd steps:

step = 0
q = Queue()
while(A is not finished):
    move(A)
    step += 1
    if A outputs x:
        q.add(x)
    if step == d:  
        output(q.pull()) 
        step=0 
output(q) # output what remains in q

IncP1=DelayPIncP_1 = DelayP

When the simulation of AA reaches step 2n2^n:

Natural notion in data structure

Suppose enumeration algorithm AA works as follows:

Delay of AA: worst case complexity of insert/delete

\rightarrow O(|T|)O(|T|) if a resize occurs…

but incremental delay of AA: amortized complexity of insert/delete

\rightarrow to analyse the time needed to output kk solutions, one can consider each operation on TT to be O(1)O(1).

Wrap up on IncP1IncP_1

If you do not agree, IncP1=DelayPIncP_1 = DelayP anyway…

… but regularizing the delay is expensive in space.

Geometric Amortization

Main contribution

IncP1IncP_1(POLYSPACE)(POLYSPACE) == DelayPDelayP(POLYSPACE)(POLYSPACE)

Regularization using only polynomial space.

Demo first

Detailed Statement

For every IncP1IncP_1-enumerator A(x)A(x) with NN solutions:

there exists a DelayPDelayP-enumerator with

Geometric Amortization

Faster how?

Key Lemmas

  1. If A0,,AiA_0, \dots, A_i have output kk solutions, Ai+1A_{i+1} has moved by at least 2dk2dk steps.
  2. There are at least 2i2^i solutions in [0;2id][0; 2^id].

When A0A_0 is finished, AiA_i has moved by at least 2i+1d2^{i+1}d steps: it has explored all its zone.

Delay

Between two outputs:

Delay: time needed to simulate l×2dl \times 2d steps and to work with counters.

Simulating RAMs and Counters

Gives a O(dlog(dN)2)O(d\cdot\log(dN)^2) delay (polynomial).

  • Gray Code based counters + pointer to its most significant bit: delay of O(dlog(N))O(d \cdot \log(N)).

Improvements

Previous algorithm assumes knowledge of (at least an upper bound):

  1. NN, number of solutions of A(x)A(x)
  2. ss, space used by A(x)A(x)
  3. dd, Incremental delay of A(x)A(x)
  • Start with a bounded number A0,,AkA_0, \dots, A_k processes
  • When AkA_k enters its zone, copy it into Ak+1A_{k+1}
  • This approach preserves total time.
  • Encode memory of process with a dedicated dynamic data structure
  • Tradeoff: O(1)O(1) simulation with space O(snε)O(s \cdot n^\varepsilon) vs O(loglogn)O(\log \log n) simulation with space O(s)O(s).

We do not know exactly but

Unknown incremental delay I

Goal:

It is provably impossible:

  • Use an adversarial oracle AA that outpus #A1\#A-1 solutions first and hold on the last solution until REG(A)REG(A) outputs #A1\#A-1 solutions.

Unknown incremental delay II

For every ε\varepsilon, one can construct REGREG such that REG(A)REG(A) has delay dA1+εd_A^{1+\varepsilon}.

Naive regularization and pull a solution from the queue if no solution has been output since Cεd̃C_\varepsilon \cdot \tilde{d}

where

  • d̃=steps(Nsol+1)1\tilde{d} = {steps \cdot (Nsol+1)^{-1}} approximate locally the incremental delay (NsolNsol solutions seen after stepssteps)
  • Cε=(12ε)1C_\varepsilon = {(1-2^\varepsilon)}^{-1}

Open Problem: make this tradeoff work with geometric amortization.

Generalizations and Applications

Incremental time

The approach generalizes to collapse the following classes:

Change budget given to each process to 2Sk(k+1)d2S^k(k+1)d where SS is the number of solutions output so far.

Trading average delay with worst case delay

E(x,_)E(x,\_) a self reducible problem :

Average delay is valid in every branches:

  • it is an incremental delay,
  • can be trade for a worst case delay using geometric amortization

Example: enumerating DNF models

Problem:

Branch and bound algorithm: extension problem efficiently solvable.

  • If D[x1=0]D[x_1=0] has a solution, recursively enumerate D[x1=0]D[x_1=0].
  • If D[x1=1]D[x_1=1] has a solution, recursively enumerate D[x1=1]D[x_1=1].
  • Linear delay, in |D||D| (if implemented correctly).

DNF enumeration in strong polynomial delay

Open question: can we solve it with a delay that is polynomial only in the size of the solutions, ie, nn.

Two conjectures:

DNF enumeration: O(n2mμ)O(n^2 m^\mu) delay

Branch and bound algorithm has:

  1. average delay O(n2mμ)O(n^2 \cdot m^\mu) with μ=1log2(3)<1\mu = 1-\log_2(3)<1,
  2. incremental delay O(n2mμ)O(n^2 \cdot m^\mu) with μ=1log2(3)<1\mu = 1-\log_2(3)<1,
  3. Geometric amortization gives worst case delay O(poly(n)mμ)O(poly(n) \cdot m^\mu)

Conclusion

Open problem: can we regularize without changing output order?