Geometric Amortization of Enumeration Algorithms

Florent Capelli
Université de Lille
Yann Strozecki
Université Paris Saclay

Novembre 24, 2022

WEPA

Enumeration Complexity

Notation

Let A(x, y) be a set.

Problem EnumA: output A(x, _) := {y ∣ A(x, y)} on input x.

y ∈ A(x, _) will be called a solution of (the problem) A on input x.

Main Assumption

In this talk, A is an NP-predicate, that is:

  • y ∈ A(x, _) can be tested in polynomial time.
  • y is of size polynomial in the size of x.

Many natural problems have this property:

  • Enumerate the answer set of a database query Q on database 𝔻
  • Enumerate the transversals of a hypergraph

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumA is in OutputP if it can be solved in time polynomial in:

  • |x|
  • and |A(x, _)|

Delay

Total time is not always satisfactory:

  • Process solutions in a stream.
  • Only peek at the solution set.

Delay: the longest time one has to wait between the output of two solutions.

Example:

Enumerate (0 + 1)n:

  • Method 1: Generates every words of length k inductively up to length n and output them.
  • Method 2: Start from 0n, output it and take next word (using Gray Code for example).

Both have polynomial total time but delay in Method 1 is exponential while Method 2 has constant delay.

Holy DelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Why do we care?

  1. Guarantees a resonnable waiting time before next solution.
  2. Gives a t × delay upper bound on the time needed to output t solutions.

(Unpopular?) Opinion: Reason 1 is rarely useful…

Linear Incremental Time

IncP1

Algorithms such that for every t, after t ⋅ d(n) steps, the algorithm has output at least t solutions.

We say that d(n) is the incremental delay.

Promoting IncP1

DelayP vs IncP1

Clearly DelayP ⊆ IncP1: after delay × t, at least t solutions output.

For the other way around: 2n delay but incremental delay of 2.

  • if t ≤ 2n, t solutions are output in time t ≤ 2t
  • if t = 2n + 1, last solution is output at time 2n + 1 ≤ 2t

(Naive) Regularization

Given an IncP1-enumerator A with incremental delay d, one can regularize the delay using a queue to delay the output of solutions every d steps:

step = 0
q = Queue()
while(A is not finished):
    move(A)
    step += 1
    if A outputs x:
        q.add(x)
    if step == d:  
        output(q.pull()) 
        step=0 
output(q) # output what remains in q

IncP1 = DelayP

When the simulation of A reaches step 2n:

  • pushed 2n solutions in the queue
  • pulled 2n/2 of them
  • 2n/2 remains…

Natural notion in data structure

Suppose enumeration algorithm A works as follows:

  • A maintains a dynamic array T
  • A performs insert/deletion on T between two outputs

Delay of A: worst case complexity of insert/delete

O(|T|) if a resize occurs…

but incremental delay of A: amortized complexity of insert/delete

to analyse the time needed to output k solutions, one can consider each operation on T to be O(1).

Wrap up on IncP1

  • Incremental delay better than (worst case) delay.
  • Better analysis tools using amortized complexity of data structure.

If you do not agree, IncP1 = DelayP anyway…

… but regularizing the delay is expensive in space.

Geometric Amortization

Main contribution

IncP1(POLYSPACE) = DelayP(POLYSPACE)

Regularization using only polynomial space.

Demo first

Detailed Statement

For every IncP1-enumerator A(x) with N solutions:

  • incremental delay d,
  • space s,
  • total time T

there exists a DelayP-enumerator with

  • delay O(dlog (N))
  • space O(slog (N))
  • total time O(T)

Geometric Amortization

  • Maintain l + 1 = ⌈log (N)⌉ simulations A0, …, Al of A
  • Ai outputs solutions for steps in [2id; 2i + 1d[.
  • Ai + 1 “moves faster” than Ai.

Faster how?

  • Ai moves by at most 2d steps.
  • If Ai finds a solution in its zone, outputs it and proceed back with Al.
  • Otherwise, proceed with Ai − 1.
  • Stops when A0 is out of its zone.

Key Lemmas

  1. If A0, …, Ai have output k solutions, Ai + 1 has moved by at least 2dk steps.
  2. There are at least 2i solutions in [0; 2id].

When A0 is finished, Ai has moved by at least 2i + 1d steps: it has explored all its zone.

Delay

Between two outputs:

  • each process Ai moves by at most 2d steps,
  • at most l = log (N) processes,
  • step counters are incremented / compared (to check whether Ai is in zone [2i ⋅ d; 2i + 1 ⋅ d]

Delay: time needed to simulate l × 2d steps and to work with counters.

Simulating RAMs and Counters

  • Simulating RAM: With bounds on incremental delay, space and number of solutions of A: O(1) to simulate one step.

  • Counters: of size at most log (dN)

Gives a O(d ⋅ log (dN)2) delay (polynomial).

  • Gray Code based counters + pointer to its most significant bit: delay of O(d ⋅ log (N)).

Improvements

Previous algorithm assumes knowledge of (at least an upper bound):

  1. N, number of solutions of A(x)
  2. s, space used by A(x)
  3. d, Incremental delay of A(x)
  • Start with a bounded number A0, …, Ak processes
  • When Ak enters its zone, copy it into Ak + 1
  • This approach preserves total time.
  • Encode memory of process with a dedicated dynamic data structure
  • Tradeoff: O(1) simulation with space O(s ⋅ nε) vs O(log log n) simulation with space O(s).

We do not know exactly but

Unknown incremental delay I

Goal:

  • Construct REG, taking as input a black box oracle access to an enumerator A; the incremental delay dA is unknown.
  • REG(A) outputs the same solutions as A with delay α ⋅ dA.
  • Can α be independent on A (constant)?

It is provably impossible:

  • Use an adversarial oracle A that outpus #A − 1 solutions first and hold on the last solution until REG(A) outputs #A − 1 solutions.

Unknown incremental delay II

For every ε, one can construct REG such that REG(A) has delay dA1 + ε.

Naive regularization and pull a solution from the queue if no solution has been output since Cε ⋅ 

where

  •  = steps ⋅ (Nsol + 1) − 1 approximate locally the incremental delay (Nsol solutions seen after steps)
  • Cε = (1 − 2ε) − 1

Open Problem: make this tradeoff work with geometric amortization.

Generalizations and Applications

Incremental time

The approach generalizes to collapse the following classes:

  • Usual k-incremental time: the delay between solution i and i + 1 is O(poly(n)ik).
  • Relaxed (k + 1)-incremental time: for every i, at least i solutions have been output after time poly(n)ik + 1.

Change budget given to each process to 2Sk(k + 1)d where S is the number of solutions output so far.

Trading average delay with worst case delay

E(x, _) a self reducible problem :

  • E(x, _) = ⋃iE(xi, _) with |xi| < |x|
  • Branch and bound enumeration algorithm A for E
    • Polynomial delay
    • Better average delay when branches have a lot of solutions

Average delay is valid in every branches:

  • it is an incremental delay,
  • can be trade for a worst case delay using geometric amortization

Example: enumerating DNF models

Problem:

  • Input: a DNF D = ⋁i ≤ m Si on variables x1, …, xn
  • Output: every satisfying assignments of x1, …, xn of D.

Branch and bound algorithm: extension problem efficiently solvable.

  • If D[x1 = 0] has a solution, recursively enumerate D[x1 = 0].
  • If D[x1 = 1] has a solution, recursively enumerate D[x1 = 1].
  • Linear delay, in |D| (if implemented correctly).

DNF enumeration in strong polynomial delay

Open question: can we solve it with a delay that is polynomial only in the size of the solutions, ie, n.

Two conjectures:

  • Strong DNF Enumeration conjecture: impossible with a delay with a o(m) dependency. Refuted using Geometric Amortization
  • DNF Enumeration conjecture: impossible with a delay poly(n).

DNF enumeration: O(n2mμ) delay

Branch and bound algorithm has:

  1. average delay O(n2 ⋅ mμ) with μ = 1 − log2(3) < 1,
  2. incremental delay O(n2 ⋅ mμ) with μ = 1 − log2(3) < 1,
  3. Geometric amortization gives worst case delay O(poly(n) ⋅ mμ)

Conclusion

  • New method to make delay regular without exponential space
  • Message: incremental delay may be more natural than worst case delay.

Open problem: can we regularize without changing output order?