Geometric Amortization of Enumeration Algorithms

Florent Capelli

Université de Lille

Yann Strozecki

Université Paris Saclay

Novembre 24, 2022

WEPA

Enumeration Complexity

Notation

Let A(x, y) be a set.

Problem EnumA: output A(x, _) := {y ∣ A(x, y)} on input x.

y ∈ A(x, _) will be called a solution of (the problem) A on input x.

Main Assumption

In this talk, A is an NP-predicate, that is:

y ∈ A(x, _) can be tested in polynomial time.
y is of size polynomial in the size of x.

Many natural problems have this property:

Enumerate the answer set of a database query Q on database 𝔻
Enumerate the transversals of a hypergraph ℋ

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumA is in OutputP if it can be solved in time polynomial in:

|x|
and |A(x, _)|

Delay

Total time is not always satisfactory:

Process solutions in a stream.
Only peek at the solution set.

Delay: the longest time one has to wait between the output of two solutions.

Example:

Enumerate (0 + 1)ⁿ:

Method 1: Generates every words of length k inductively up to length n and output them.
Method 2: Start from 0ⁿ, output it and take next word (using Gray Code for example).

Both have polynomial total time but delay in Method 1 is exponential while Method 2 has constant delay.

Holy DelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Why do we care?

Guarantees a resonnable waiting time before next solution.
Gives a t × delay upper bound on the time needed to output t solutions.

(Unpopular?) Opinion: Reason 1 is rarely useful…

Linear Incremental Time

IncP₁

Algorithms such that for every t, after t ⋅ d(n) steps, the algorithm has output at least t solutions.

We say that d(n) is the incremental delay.

Promoting IncP₁

DelayP vs IncP₁

Clearly DelayP ⊆ IncP₁: after delay × t, at least t solutions output.

For the other way around: 2ⁿ delay but incremental delay of 2.

if t ≤ 2ⁿ, t solutions are output in time t ≤ 2t
if t = 2ⁿ + 1, last solution is output at time 2^n + 1 ≤ 2t

(Naive) Regularization

Given an IncP₁-enumerator A with incremental delay d, one can regularize the delay using a queue to delay the output of solutions every d steps:

step = 0
q = Queue()
while(A is not finished):

    move(A)
    step += 1

    if A outputs x:
        q.add(x)

    if step == d:  
        output(q.pull()) 
        step=0 

output(q) # output what remains in q

IncP₁ = DelayP

When the simulation of A reaches step 2ⁿ:

pushed 2ⁿ solutions in the queue
pulled 2ⁿ/2 of them
2ⁿ/2 remains…

Natural notion in data structure

Suppose enumeration algorithm A works as follows:

A maintains a dynamic array T
A performs insert/deletion on T between two outputs

Delay of A: worst case complexity of insert/delete

→ O(|T|) if a resize occurs…

but incremental delay of A: amortized complexity of insert/delete

→ to analyse the time needed to output k solutions, one can consider each operation on T to be O(1).

Wrap up on IncP₁

Incremental delay better than (worst case) delay.
Better analysis tools using amortized complexity of data structure.

If you do not agree, IncP₁ = DelayP anyway…

… but regularizing the delay is expensive in space.

Geometric Amortization

Main contribution

IncP₁(POLYSPACE) = DelayP(POLYSPACE)

Regularization using only polynomial space.

Demo first

Meet Coussinet

http://florent.capelli.me/coussinet/

Detailed Statement

For every IncP₁-enumerator A(x) with N solutions:

incremental delay d,
space s,
total time T

there exists a DelayP-enumerator with

delay O(dlog (N))
space O(slog (N))
total time O(T)

Geometric Amortization

Maintain l + 1 = ⌈log (N)⌉ simulations A₀, …, A_l of A
A_i outputs solutions for steps in [2ⁱd; 2^i + 1d[.
A_i + 1 “moves faster” than A_i.

Faster how?

A_i moves by at most 2d steps.
If A_i finds a solution in its zone, outputs it and proceed back with A_l.
Otherwise, proceed with A_i − 1.
Stops when A₀ is out of its zone.

Key Lemmas

If A₀, …, A_i have output k solutions, A_i + 1 has moved by at least 2dk steps.
There are at least 2ⁱ solutions in [0; 2ⁱd].

When A₀ is finished, A_i has moved by at least 2^i + 1d steps: it has explored all its zone.

Delay

Between two outputs:

each process A_i moves by at most 2d steps,
at most l = log (N) processes,
step counters are incremented / compared (to check whether A_i is in zone [2ⁱ ⋅ d; 2^i + 1 ⋅ d]

Delay: time needed to simulate l × 2d steps and to work with counters.

Simulating RAMs and Counters

Simulating RAM: With bounds on incremental delay, space and number of solutions of A: O(1) to simulate one step.
Counters: of size at most log (dN)

Gives a O(d ⋅ log (dN)²) delay (polynomial).

Gray Code based counters + pointer to its most significant bit: delay of O(d ⋅ log (N)).

Improvements

Previous algorithm assumes knowledge of (at least an upper bound):

N, number of solutions of A(x)
s, space used by A(x)
d, Incremental delay of A(x)

Start with a bounded number A₀, …, A_k processes
When A_k enters its zone, copy it into A_k + 1
This approach preserves total time.

Encode memory of process with a dedicated dynamic data structure
Tradeoff: O(1) simulation with space O(s ⋅ n^ε) vs O(log log n) simulation with space O(s).

We do not know exactly but…

Unknown incremental delay I

Goal:

Construct REG, taking as input a black box oracle access to an enumerator A; the incremental delay d_A is unknown.
REG(A) outputs the same solutions as A with delay α ⋅ d_A.
Can α be independent on A (constant)?

It is provably impossible:

Use an adversarial oracle A that outpus #A − 1 solutions first and hold on the last solution until REG(A) outputs #A − 1 solutions.

Unknown incremental delay II

For every ε, one can construct REG such that REG(A) has delay d_A^1 + ε.

Naive regularization and pull a solution from the queue if no solution has been output since C_ε ⋅ d̃

where

d̃ = steps ⋅ (Nsol + 1)^− 1 approximate locally the incremental delay (Nsol solutions seen after steps)
C_ε = (1 − 2^ε)^− 1

Open Problem: make this tradeoff work with geometric amortization.

Generalizations and Applications

Incremental time

The approach generalizes to collapse the following classes:

Usual k-incremental time: the delay between solution i and i + 1 is O(poly(n)i^k).
Relaxed (k + 1)-incremental time: for every i, at least i solutions have been output after time poly(n)i^k + 1.

Change budget given to each process to 2S^k(k + 1)d where S is the number of solutions output so far.

Trading average delay with worst case delay

E(x, _) a self reducible problem :

E(x, _) = ⋃_iE(x_i, _) with |x_i| < |x|
Branch and bound enumeration algorithm A for E
- Polynomial delay
- Better average delay when branches have a lot of solutions

Average delay is valid in every branches:

it is an incremental delay,
can be trade for a worst case delay using geometric amortization

Example: enumerating DNF models

Problem:

Input: a DNF D = ⋁_i ≤ m S_i on variables x₁, …, x_n
Output: every satisfying assignments of x₁, …, x_n of D.

Branch and bound algorithm: extension problem efficiently solvable.

If D[x₁ = 0] has a solution, recursively enumerate D[x₁ = 0].
If D[x₁ = 1] has a solution, recursively enumerate D[x₁ = 1].
Linear delay, in |D| (if implemented correctly).

DNF enumeration in strong polynomial delay

Open question: can we solve it with a delay that is polynomial only in the size of the solutions, ie, n.

Two conjectures:

Strong DNF Enumeration conjecture: impossible with a delay with a o(m) dependency. Refuted using Geometric Amortization
DNF Enumeration conjecture: impossible with a delay poly(n).

DNF enumeration: O(n²m^μ) delay

Branch and bound algorithm has:

average delay O(n² ⋅ m^μ) with μ = 1 − log₂(3) < 1,
incremental delay O(n² ⋅ m^μ) with μ = 1 − log₂(3) < 1,
Geometric amortization gives worst case delay O(poly(n) ⋅ m^μ)

Conclusion

New method to make delay regular without exponential space
Message: incremental delay may be more natural than worst case delay.

Open problem: can we regularize without changing output order?

Geometric Amortization of Enumeration Algorithms

Enumeration Complexity

Notation

Main Assumption

Complexity

Total Time

Delay

Example:

Holy DelayP Grail

Linear Incremental Time

Promoting IncP1

DelayP vs IncP1

(Naive) Regularization

IncP1 = DelayP

Natural notion in data structure

Related work 1: Goldberg’s Thesis

Related work 2: Any-k Algorithms

Wrap up on IncP1

Geometric Amortization

Main contribution

Demo first

Detailed Statement

Geometric Amortization

Faster how?

Key Lemmas

Delay

Simulating RAMs and Counters

Improvements

Unknown incremental delay I

Unknown incremental delay II

Generalizations and Applications

Incremental time

Trading average delay with worst case delay

Example: enumerating DNF models

DNF enumeration in strong polynomial delay

DNF enumeration: O(n2mμ) delay

Conclusion

Promoting IncP₁

DelayP vs IncP₁

IncP₁ = DelayP

Wrap up on IncP₁

DNF enumeration: O(n²m^μ) delay