#SAT
A tutorial
Florent Capelli
CRIL, Université d’Artois
JITA 2025
October 16, 2025
The puzzling SAT puzzle
- You have a bunch of switches:
- Each switch turn on/off different lights.
- You want to turn at least one light on in each line:
This problem is known as the SAT problem.
SAT as Propositional Logic
- Each switch is a Boolean variable
- true is switch on
- false is switch off
- Each cell is a literal: either
or
- Each row is a clause, ie a disjunction of literal: “at
least one”
- The game = a conjunction of clauses, aka a CNF
Formula
|
|
|
|
|
|
+
|
|
+
|
-
|
|
|
+
|
-
|
-
|
+
|
|
|
-
|
+
|
|
+
|
|
|
+
|
|
|
+
|
|
SAT is Hard
- Obvious method for SAT: try every combination of switches.
- Takes
tries for
switches. Does not scale.
- Surely there is a better algorithm?
- in practice yes (stay tunned)
- in theory, not sure… every known algorithm needs
steps.
Strong Exponential Time Hypothesis: we believe SAT cannot be solved
in time
for
in the worst case.
SAT as a “Programming Language”
A reason for the hardness of SAT: its
expressivity.
- We can encode many things with SAT.
- Sudoku example:
-
is true if entry
is equal to
- :
entry
cannot be both set to
and
- :
entry
has at least one value.
-
for
:
entries
and
cannot be both set at value
(because they are on the same line)
- A solution to a Sudoku = a solution to the CNF formula.
We can “efficiently” encode many types of problem in a CNF
formula.
Modeling reasoning
- Sudoku: specify constraints of the system and let
the solver figures out an answer.
- Many systems are described through
constraints.
- Configuration problems:
- Define a set of “possible” products with options.
- Physical, industrial, marketing constraints.
- E.g.: bikes, computers, software (compilation options).
- “Color gold only possible if fancy frame and fancy bell are
choosen”:
- We can use a more expressive language (“Constraint Satisfaction
Problem”), see e.g. xcsp, which is often
“compiled” into SAT before solving.
Modeling computation
We can actually encode any non-deterministic Turing machine:
|
0
|
0
|
1
|
0
|
0
|
1
|
0 1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
|
|
|
|
|
|
↥
|
↥
|
↥
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- :
at step
,
the head is a position
in state
.
- :
the position 7 at time
contains a
.
- Transitions
-
NP-completeness
This shows that SAT is NP-complete! (Cook, Levin,
1973)
- For any NP problem (“resonable”), we can write a formula that is SAT
if and only if the problem has a solution.
- Most people believes that this cannot be done in polynomial
time.
- Why should we try to solve such a hard problem?
- Efficient algorithms for SAT in practice = efficient algorithm for
many other problems
- Very simple structure: allows for low level optimization.
A Branching algorithm
Adding Unit Propagation
DPLL
- If every clause are satisfied, return
.
- If one clause is refuted, return
.
- UNIT PROPAGATION: If there is a clause with one
variable not set, pick it and satisfy it, e.g
,
pick
- PURE LITERAL ELIMINATION: If there is a variable
appearing only positively (resp negatively), pick
and
(resp.
).
- Otherwise, pick a variable
and
using a good heuristic
- Try to recursively find a solution to
.
- If no solution, backtrack and try
.
Problem: Backtracks are dealt with independently, some insights are
lost.
CDCL Solvers
Conflict Driven Clause Learning: try to learn
why.
- Set
:
Unit Propagation of
: conflict found.
- We know that, independently of the value of
,
cannot be part of a solution.
- Add clause
to catch it sooner.
- If we later set
,
we have UP on
!
- Speed up later branches.
- Reduce overhead of recursion.
CDCL-solvers (+good heuristics, +good data structures, +“agressive
restarts”):
- really efficient on many instances
- bruteforce at its finest
- naturally unravel hidden structures in the encoding
Counting the number of solutions
- With 1: {1,2}, {1,2,3}, {1,2,3,4}, {1,2,4}, {1,4}, {1,3,4}
- Without 1 and with 2: {2,3,4}
- Without 1,2 and with 3: {3,4}
This problem is known as #SAT: given a CNF
,
return
,
its number of solutions.
Why counting?
Modeling constraints: counting enables better
“reasoning”.
- Configuration problem:
- Find some options compatible with a lot of product
- Measure “probability” of an option (number of product having
it)
- Counting (with weights) = probabilistic reasoning
- Encode other reasoning tasks such as Bayesian inference
How hard is it?
- Counting is way harder than finding a solution
- Intution: we have to explore the entire search
space
- Complexity theory: #P-complete problem.
- Toda’s Theorem: one oracle call to #P allows to solve the
whole polynomial hierarchy.
Counting by enumeration
Counting by enumeration
- Run DPLL with a counter.
- When a model is found: add
to the counter where
are unassigned variables.
- Keep backtracking until every branch are explored.
- Use unit propagation:
if there is a clause
.
- Pure literal cannot be eliminated:
has solutions with
!
Improvements possible: sometimes, we are redoing expensive
computation.
Caching in Exhaustive DPLL
- Cache precomputed values
- Example:
-
has 3 solutions
-
,
also has
solutions!
- 6 solutions in total.
Exploiting SAT solver efficiency
Exhaustive DPLL for
- Spend time maintaining caching in dead branches
- Call to SAT solver at each level:
- cut conflict branches
- CDCL : learn some conflict that can be used
- E.g., SAT call learns
in the first call.
Connected components
Observation:
with
then
because solutions of
and
can be recombined independently to form a solution of
.
- Add connected components detection in exhaustive DPLL.
- Reduces the number of operations.
Example with connected components
|
|
|
|
|
|
|
5 solutions
|
|
5 solution
|
|
4 solutions
|
Exhaustive DPLL: summary
- Run SAT solver to prune unsatisfiable branches. Exploit learnt
clauses.
-
when
and
have disjoint variables.
-
otherwise.
- When
is computed, cache its value in case
is encountered again.
- Many tools based on this idea: d4, SharpSAT, SharpSAT-TD, etc.
Choosing variables
- Complexity of exhaustive DPLL depends on the variable picked for
branching
- For SAT solver:
- try to quickly converge to conflicts to learn
- try to find one solution
- Many heuristics for this: VSIDS, Bohm, Maximum occurrence etc.
- Very different for #SAT solvers: aim at factorizing the
search space.
- Find small separator: treewidth guided
heuristics.