# Geometric Amortization of Enumeration Algorithms

Novembre 24, 2022

WEPA

# Enumeration Complexity

## Notation

Let A(x, y) be a set.

Problem EnumA: output A(x, _) := {y ∣ A(x, y)} on input x.

y ∈ A(x, _) will be called a solution of (the problem) A on input x.

## Main Assumption

In this talk, A is an NP-predicate, that is:

• y ∈ A(x, _) can be tested in polynomial time.
• y is of size polynomial in the size of x.

Many natural problems have this property:

• Enumerate the answer set of a database query Q on database 𝔻
• Enumerate the transversals of a hypergraph

## Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

## Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumA is in OutputP if it can be solved in time polynomial in:

• |x|
• and |A(x, _)|

## Delay

Total time is not always satisfactory:

• Process solutions in a stream.
• Only peek at the solution set.

Delay: the longest time one has to wait between the output of two solutions.

## Example:

Enumerate (0 + 1)n:

• Method 1: Generates every words of length k inductively up to length n and output them.
• Method 2: Start from 0n, output it and take next word (using Gray Code for example).

Both have polynomial total time but delay in Method 1 is exponential while Method 2 has constant delay.

## Holy DelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Why do we care?

1. Guarantees a resonnable waiting time before next solution.
2. Gives a t × delay upper bound on the time needed to output t solutions.

(Unpopular?) Opinion: Reason 1 is rarely useful…

## Linear Incremental Time

IncP1

Algorithms such that for every t, after t ⋅ d(n) steps, the algorithm has output at least t solutions.

We say that d(n) is the incremental delay.

# Promoting IncP1

## DelayP vs IncP1

Clearly DelayP ⊆ IncP1: after delay × t, at least t solutions output.

For the other way around: 2n delay but incremental delay of 2.

• if t ≤ 2n, t solutions are output in time t ≤ 2t
• if t = 2n + 1, last solution is output at time 2n + 1 ≤ 2t

## (Naive) Regularization

Given an IncP1-enumerator A with incremental delay d, one can regularize the delay using a queue to delay the output of solutions every d steps:

``````step = 0
q = Queue()
while(A is not finished):``````
``````    move(A)
step += 1``````
``````    if A outputs x:
``````    if step == d:
output(q.pull())
step=0 ``````
``output(q) # output what remains in q``

## IncP1 = DelayP

When the simulation of A reaches step 2n:

• pushed 2n solutions in the queue
• pulled 2n/2 of them
• 2n/2 remains…

## Natural notion in data structure

Suppose enumeration algorithm A works as follows:

• A maintains a dynamic array T
• A performs insert/deletion on T between two outputs

Delay of A: worst case complexity of insert/delete

O(|T|) if a resize occurs…

but incremental delay of A: amortized complexity of insert/delete

to analyse the time needed to output k solutions, one can consider each operation on T to be O(1).

## Wrap up on IncP1

• Incremental delay better than (worst case) delay.
• Better analysis tools using amortized complexity of data structure.

If you do not agree, IncP1 = DelayP anyway…

… but regularizing the delay is expensive in space.

# Geometric Amortization

## Main contribution

IncP1(POLYSPACE) = DelayP(POLYSPACE)

Regularization using only polynomial space.

## Detailed Statement

For every IncP1-enumerator A(x) with N solutions:

• incremental delay d,
• space s,
• total time T

there exists a DelayP-enumerator with

• delay O(dlog (N))
• space O(slog (N))
• total time O(T)

## Geometric Amortization

• Maintain l + 1 = ⌈log (N)⌉ simulations A0, …, Al of A
• Ai outputs solutions for steps in [2id; 2i + 1d[.
• Ai + 1 “moves faster” than Ai.

## Faster how?

• Ai moves by at most 2d steps.
• If Ai finds a solution in its zone, outputs it and proceed back with Al.
• Otherwise, proceed with Ai − 1.
• Stops when A0 is out of its zone.

## Key Lemmas

1. If A0, …, Ai have output k solutions, Ai + 1 has moved by at least 2dk steps.
2. There are at least 2i solutions in [0; 2id].

When A0 is finished, Ai has moved by at least 2i + 1d steps: it has explored all its zone.

## Delay

Between two outputs:

• each process Ai moves by at most 2d steps,
• at most l = log (N) processes,
• step counters are incremented / compared (to check whether Ai is in zone [2i ⋅ d; 2i + 1 ⋅ d]

Delay: time needed to simulate l × 2d steps and to work with counters.

## Simulating RAMs and Counters

• Simulating RAM: With bounds on incremental delay, space and number of solutions of A: O(1) to simulate one step.

• Counters: of size at most log (dN)

Gives a O(d ⋅ log (dN)2) delay (polynomial).

• Gray Code based counters + pointer to its most significant bit: delay of O(d ⋅ log (N)).

## Improvements

Previous algorithm assumes knowledge of (at least an upper bound):

1. N, number of solutions of A(x)
2. s, space used by A(x)
3. d, Incremental delay of A(x)
• When Ak enters its zone, copy it into Ak + 1
• This approach preserves total time.
• Encode memory of process with a dedicated dynamic data structure
• Tradeoff: O(1) simulation with space O(s ⋅ nε) vs O(log log n) simulation with space O(s).

We do not know exactly but

## Unknown incremental delay I

Goal:

• Construct REG, taking as input a black box oracle access to an enumerator A; the incremental delay dA is unknown.
• REG(A) outputs the same solutions as A with delay α ⋅ dA.
• Can α be independent on A (constant)?

It is provably impossible:

• Use an adversarial oracle A that outpus #A − 1 solutions first and hold on the last solution until REG(A) outputs #A − 1 solutions.

## Unknown incremental delay II

For every ε, one can construct REG such that REG(A) has delay dA1 + ε.

Naive regularization and pull a solution from the queue if no solution has been output since Cε ⋅

where

•  = steps ⋅ (Nsol + 1) − 1 approximate locally the incremental delay (Nsol solutions seen after steps)
• Cε = (1 − 2ε) − 1

Open Problem: make this tradeoff work with geometric amortization.

# Generalizations and Applications

## Incremental time

The approach generalizes to collapse the following classes:

• Usual k-incremental time: the delay between solution i and i + 1 is O(poly(n)ik).
• Relaxed (k + 1)-incremental time: for every i, at least i solutions have been output after time poly(n)ik + 1.

Change budget given to each process to 2Sk(k + 1)d where S is the number of solutions output so far.

## Trading average delay with worst case delay

E(x, _) a self reducible problem :

• E(x, _) = ⋃iE(xi, _) with |xi| < |x|
• Branch and bound enumeration algorithm A for E
• Polynomial delay
• Better average delay when branches have a lot of solutions

Average delay is valid in every branches:

• it is an incremental delay,
• can be trade for a worst case delay using geometric amortization

## Example: enumerating DNF models

Problem:

• Input: a DNF D = ⋁i ≤ m Si on variables x1, …, xn
• Output: every satisfying assignments of x1, …, xn of D.

Branch and bound algorithm: extension problem efficiently solvable.

• If D[x1 = 0] has a solution, recursively enumerate D[x1 = 0].
• If D[x1 = 1] has a solution, recursively enumerate D[x1 = 1].
• Linear delay, in |D| (if implemented correctly).

## DNF enumeration in strong polynomial delay

Open question: can we solve it with a delay that is polynomial only in the size of the solutions, ie, n.

Two conjectures:

• Strong DNF Enumeration conjecture: impossible with a delay with a o(m) dependency. Refuted using Geometric Amortization
• DNF Enumeration conjecture: impossible with a delay poly(n).

## DNF enumeration: O(n2mμ) delay

Branch and bound algorithm has:

1. average delay O(n2 ⋅ mμ) with μ = 1 − log2(3) < 1,
2. incremental delay O(n2 ⋅ mμ) with μ = 1 − log2(3) < 1,
3. Geometric amortization gives worst case delay O(poly(n) ⋅ mμ)

## Conclusion

• New method to make delay regular without exponential space
• Message: incremental delay may be more natural than worst case delay.

Open problem: can we regularize without changing output order?