Ease and Toil: Analyzing Sudoku
February 18, 2008
Look at any current magazine, newspaper, computer game package or handheld gaming
device and you likely fifind sudoku, the latest puzzle game sweeping the nation. Sudoku
is a number-based logic puzzle in which the numbers 1 through 9 are arranged in a 9 × 9
matrix, subject to the constraint that there are no repeated numbers in any row, column,
or designated 3 × 3 square.
In addition to being entertaining, sudoku promises valuable insight into computer sci
ence and mathematical modeling. In particular, since sudoku solving is an NP-Complete
problem, algorithms to generate and solve sudoku puzzles may offer new approaches to a
whole class of computational problems . Moreover, we can further explore mathematical
modeling techniques through generating puzzles since sudoku construction is essentially
an optimization problem.
The purpose of this paper is to propose an algorithm that may be used to construct
unique sudoku puzzles with four different levels of diffificulty. We attempted to minimize
the complexity of the algorithm while still maintaining separate diffificulty levels and guar
anteeing unique solutions.
In order to accomplish our objectives, we developed metrics with which to analyze the
diffificulty of a given puzzle. By applying our metrics to published control puzzles with spe
cifific diffificulty levels we were able to develop classifification functions for specifific diffificulty
ratings. We then used the functions we developed to ensure that our algorithm gener
ated puzzles with diffificulty levels analogous to those currently published. We also sought
out to measure and reduce the computational complexity of the generation and metric
measurement algorithms.
Finally, we worked to analyze and reduce the complexity involved in generating puzzles
while maintaining the ability to choose the diffificulty of the puzzles generated. To do so,
we implemented a profifiler and performed statistical hypothesis testing to streamline the
algorithm .数学中国
www.
madio.net
本内容由645617861l同学与工作人员共同整理,恭喜同学成功完成数学中国第一期威客项目
数学中国:www.madio.net,最专业的数学建模平台
备战美赛,美赛资料下载:http://www.madio.net/forum-108-1.html
美赛讨论加入群:2017美赛官方群287336869
Page 2 of 62 MCM 2008 Team #3780
Contents
1 Introduction 3
1.1 Statement of Problem . . . . . . . . . 3
1.2 Relevance of Sudoku . . . . . . . . . 3
1.3 Goals . . . . . . . . . . . . . . . . . . 3
1.4 Rules of Sudoku . . . . . . . . . . . . 3
1.5 Terminology and Notation . . . . . . 3
1.6 Indexing . . . . . . . . . . . . . . . . 4
1.7 Formal Rules of Sudoku . . . . . . . 5
1.8 Example Puzzles . . . . . . . . . . . 5
2 Background 5
2.1 Common Solving Techniques . . . . 5
2.1.1 Naked Pair . . . . . . . . . . 5
2.1.2 Naked Triplet . . . . . . . . . 5
2.1.3 Hidden Pair . . . . . . . . . . 6
2.1.4 Hidden Triplet . . . . . . . . . 6
2.1.5 Multi-Line . . . . . . . . . . . 6
2.2 Previous Works . . . . . . . . . . . . 7
2.2.1 SudokuExplainer . . . . . . . 7
2.2.2 QQWing . . . . . . . . . . . . 7
2.2.3 GNOME Sudoku . . . . . . . 7
3 Metric Design 10
3.1 Overview . . . . . . . . . . . . . . . . 10
3.2 Assumptions . . . . . . . . . . . . . . 10
3.3 Mathematical Basis for WNEF . . . 10
3.3.1 Complexity . . . . . . . . . . . 10
4 Metric Calibration and Testing 11
4.1 Control Puzzle Sources . . . . . . . . 11
4.2 Testing Method . . . . . . . . . . . . 12
4.2.1 Defifining Diffificulty Ranges . . 12
4.2.2 Information Collection . . . . 12
4.2.3 Statistical Analysis of Con
trol Puzzles . . . . . . . . . . . 12
4.3 Choice of Weight Function. . . . . . . 12
5 Generator Algorithm 12
5.1 Overview . . . . . . . . . . . . . . . . 12
5.2 Detailed Description . . . . . . . . . 14
5.2.1 Completed Puzzle Generation 14
5.2.2 Cell Removal . . . . . . . . . . 14
5.2.3 Uniqueness Testing . . . . . . 15
5.3 Pseudocode . . . . . . . . . . . . . . . 15
5.3.1 Completed Board Generation 15
5.3.2 Random Masking . . . . . . . 16
5.3.3 Tuned Masking . . . . . . . . 17
5.3.4 Uniqueness Testing . . . . . . 17
5.4 Complexity Analysis . . . . . . . . . 18
5.4.1 Parameterization . . . . . . . 18
5.4.2 Complexity of Completed
Puzzle Generation . . . . . . . 18
5.4.3 Complexity of Uniqueness
Testing and Random Filling . 18
5.4.4 Profifiling Method . . . . . . . 18
5.4.5 WNEF vs Running Time . . . 19
5.5 Testing . . . . . . . . . . . . . . . . . 19
5.5.1 WNEF as a Function of De
sign Choices . . . . . . . . . . 19
5.5.2 Hypothesis Testing . . . . . . 19
6 Strengths and Weaknesses 19
7 Conclusions 21
References 21
1 Source Code 23
2 Screenshots of Puzzle Generator 62
List of Figures
1 Demonstration of indexing schemes. 6
2 Puzzle generated by WebSudoku
(ranked as “Easy”). . . . . . . . . . . 6
3 Top 1465 Number 77. . . . . . . . . . 7
4 An example of a hand-made Nikoli
puzzle. . . . . . . . . . . . . . . . . . 7
5 Example of the Naked Pair rule. . . 8
6 Example of the Naked Triplet rule. . 8
7 Example of the Hidden Pair rule. . . 8
8 Example of the Hidden Triplet rule. 9
9 Example of the Multi-Line rule. . . . 9
10 Examples of choice histograms. . . . 11
11 WNEF for control puzzles by diffifi-
culty. . . . . . . . . . . . . . . . . . . 13
12 WNEF correlations for various
weighting functions. . . . . . . . . . . 13
13 Running time as a function of the
obtained WNEF. . . . . . . . . . . . . 20
14 WNEF as a function of allowed fail
ures. . . . . . . . . . . . . . . . . . . . 20
15 Screenshots of puzzle generator. . . . 62数学中国
www.
madio.net
本内容由645617861l同学与工作人员共同整理,恭喜同学成功完成数学中国第一期威客项目
数学中国:www.madio.net,最专业的数学建模平台
备战美赛,美赛资料下载:http://www.madio.net/forum-108-1.html
美赛讨论加入群:2017美赛官方群287336869
Page 3 of 62 MCM 2008 Team #3780
1 Introduction
1.1 Statement of Problem
We set out to design an algorithm that would con
struct unique sudoku puzzles of various diffificul
ties as well as to develop metrics by which to mea
sure the diffificulty of a given puzzle. In particular,
our algorithm must admit at least four levels of
diffificulty while minimizing its level of complex
ity.
1.2 Relevance of Sudoku
We feel that this problem is relevant and of inter
est, since the game of sudoku is inherently math
ematical, and offers rich opportunities to explore
mathematical techniques. Indeed, the problem is
NP-Complete [3], and yet manages to be some
what accessible to casual analysis. Moreover,
by developing techniques for use with a problem
over which we have such complete control, we
may expand into other and more practical prob
lems. In fact, sudoku is essentially an exercise
in compression, and so techniques for generat
ing diffificult puzzle instances lead directly to real
izations about information content and entropy.
We, however, shall restrict our focus directly to
the problem at hand, and be content to leave
these reasons, along with sudoku’s entertainment
value, as our motivation for exploring the game.
1.3 Goals
Our goal is to create an algorithm that will pro
duce sudoku puzzles. In doing so, and to meet the
conditions of the proposed problem (section 1.1),
we aim to create an algorithm with the following
properties:
• Will only create valid puzzle instances (no
contradictions, and admitting a unique so
lution).
• Can generate puzzles at any of four differ
ent diffificulty levels (easy, medium, hard and
evil
1
).
• Produces puzzles in a reasonable amount of
time, regardless of the chosen diffificulty.
Such a set of goals could easily lead to a project
of an unmanageable scope. Thus, we explicitly do
not aim for any of the following properties:
• Attempt to “force” a particular solving
method upon players.
• To be the best available algorithm for the
task of making exceedingly diffificult puzzles.
• Impose symmetry requirements .
1.4 Rules of Sudoku
The game of sudoku is played upon a 3 × 3 grid
of blocks, each of which is a 3 × 3 grid of cells.
Each cell can have a value of 1 through 9, sub
ject to a simple constraint, or may be empty.
The object of the game is to, given a partially-
fifilled out grid called a puzzle, use logical infer
ence to place values in all of the empty cells such
that the constraints are upheld. It is fully pos
sible to create a puzzle which has no solution
(it contradicts itself, forcing the player to violate
a constraint), or which has multiple solutions.
We shall impose the additional requirement upon
puzzles that they admit exactly one solution each.
When properly fifilled out, no row, column or
block may have two cells with the same value.
This simple constraint is what allows all of the
inference to work. Some examples of puzzles and
their solutions may be found in Section 1.8. For
more details and a complete tutorial, please see
[1].
1.5 Terminology and Notation
It is diffificult to discuss our solution to the pro
posed problem without understanding some com
mon terminology. Moreover, since we will apply
more mathematical formalism here than in most
documents dealing with sudoku, it will be helpful
to introduce notational conventions.
Assignment A tuple (x, X) of a value and a cell.
If a puzzle contains an assignment (x, X),
we say that X has the value x, that X maps
to x, or that X 7→ x.
1
This term was chosen for traditional reasons, as many sources prefer to use references to immorality to measure diffifi-
culty.数学中国
www.
madio.net
本内容由645617861l同学与工作人员共同整理,恭喜同学成功完成数学中国第一期威客项目
数学中国:www.madio.net,最专业的数学建模平台
备战美赛,美赛资料下载:http://www.madio.net/forum-108-1.html
Page 4 of 62 MCM 2008 Team #3780
Candidates A set of those values which may
be assigned to a square. As more informa
tion is taken into account, the set is reduced
until only one candidate remains, at which
point it becomes the value of the cell. We
denote the set of candidates for some cell X
by X?.
Cell A single square within a sudoku puzzle,
which may have one of the integer values
from 1 to 9. We denote cells using upper
case italic serif letters: X, Y , Z.
Block One of the nine 3 × 3 squares within the
puzzle. The boundaries of these blocks are
denoted by thicker lines on the puzzle’s grid.
It is important to note that in sudoku, no
two blocks overlap (share common cells).
There are variants of sudoku, such as hy
persudoku in which this occurs, but we will
focus our attention on the traditional rules.
Grouping A set of cells in the same row, col
umn or block. We represent groupings with
uppercase boldface serif letters: X, Y, Z.
We refer unambiguously to the row group
ings Ri, the column groupings Cj and the
block groupings Bc, following the indexing
scheme in section 1.6. The set of all group
ings will be denoted G.
Metric We call a function m : P → R (assigning
a real number to each valid puzzle) a metric
if it provides information about the relative
diffificulty of the puzzle.
Puzzle A 9 × 9 matrix of cells, with at least one
empty and at least one fifilled cell. For our
purposes, we impose the additional require
ment that all puzzles have exactly one so
lution. We denote puzzles by boldface cap
ital serif letters: P, Q, R. Since this no
tation conflflicts with that for groupings, we
will always denote that a variable is a puz
zle. Moreover, we refer to cells belonging to
a puzzle: X ∈ P. Finally, in the rare in
stance that we wish to denote the set of all
valid puzzles, we shall do so with a double
struck P: P.
Representative The upper-left cell in each
block is that block’s representative. For ex
ample, the cell in the 5
th
row and 5
th
col
umn has as its representative the cell at the
fourth row and column.
Restrictions In some cases, it is more straight
forward to discuss which values a cell can
not be assigned to than to discuss the set of
candidates. Thus, the restrictions set X! for
a cell X is defifined as V\X?.
Rule An algorithm which accepts a puzzle P
and produces either a puzzle P
0
represent
ing strictly more information (more restric
tions have been added via logical inference
or cells have been fifilled in) or some value
that indicates that the rule failed to ad
vance the puzzle towards a solution.
Solution A set of assignments to all cells in a
puzzle such that all groupings have exactly
one cell assigned to each value.
Value A symbol that may be assigned to a
cell. For our purposes, all sudoku puzzles
use the traditional numeric value set V =
{1, 2, 3, 4, 5, 6, 7, 8, 9}. This can be confusing
at times, since we will be discussing other
numbers, but we choose to do so for the sake
of convention. A value is denoted by a lower
case sans serif letter: x, y, z.
1.6 Indexing
Defifine the following indicies using the terminol
ogy above (section 1.5). As a convention, all indi
cies will start with zero for the fifirst cell or block.
c : block number
k : cell number within a block
i : row number
j : column number
i
0
: representative row number
j
0
: representative column number数学中国
www.
madio.net
本内容由645617861l同学与工作人员共同整理,恭喜同学成功完成数学中国第一期威客项目
数学中国:www.madio.net,最专业的数学建模平台
备战美赛,美赛资料下载:http://www.madio.net/forum-108-1.html
Page 5 of 62 MCM 2008 Team #3780
These indicies are related by the following func
tions:
c (i, j) =
j
3
+
i
3
· 3
i(c, k) = 3
j
c
3k
+
k
3 j (c, k) = (c mod 3) · 3 + (k mod 3)
i
0
(c) = 3
j
c
3k j
0
(c) = (c mod 3) · 3
i
0
(i) = 3
i
3
j
0
(j) = 3
j
3
Figure 1 demonstrates how the rows, columns
and blocks are indexed, as well as the idea of a
block representative. In the third sudoku grid,
the representatives for each block are denoted
with an “r”.
1.7 Formal Rules of Sudoku
We may now formally state the rules of sudoku
that restrict allowable assignments using the no
tation developed thus far:
(∀G ∈ G ∀X ∈ G) X 7→ v ⇒ @Y ∈ G : Y 7→ v
Applying this sort of formalism to the rules of su
doku will allow us to make strong claims about
solving techniques later, and so it is useful intro
duce this notation for the rules.
1.8 Example Puzzles
The rules alone do not explain what a sudoku
puzzle looks like, and so we have included a few
examples of well-crafted sudoku puzzles. Figure
6 shows a puzzle ranked as “Easy” by WebSudoku
[4].
By contrast, Figures 7 and 7 show the results
of two different approaches to generating diffificult
puzzles: the fifirst one was computer generated as
part of an experiment in minimal sudoku puz
zles, whereas the second was hand-made by the
authors at Nikoli, the company most famously
associated with sudoku. It is interesting that
two such completely different approaches result
in very similar looking puzzles.
2 Background
2.1 Common Solving Techniques
As with any activity, several sets of techniques
have emerged to help solve sudoku puzzles. We
collect some here so that we may refer to them in
our own development. In all of the techniques be
low, we assume that the puzzle being solved has
a single unique solution. These techniques and
examples are adapted from [10] and [2].
2.1.1 Naked Pair
If, in a single row, column or block grouping A,
there are two cells X and Y each having the same
pair of candidates X? = Y ? = {p, q} , then those
candidates may be eliminated from all other cells
in A. To see that we can do this, assume for the
sake of contradiction that there exists some cell
Z ∈ A such that Z 7→ p, then X 67→ p, which im
plies that X 7→ q. This in turn means that Y 67→ q,
but we have from Z 7→ p that Y 67→ p, leaving
Y ? = ∅. Since the puzzle has a solution, this is a
contradiction, and Z 67→ p.
As an example of this arrangement is shown
in fifigure 5. The cells marked X and Y sat
isfy X? = Y ? = {2, 8}, and so we can remove
both 2 and 8 from all other cells in R8. That is,
∀Z ∈ (R8\ {X, Y }) : 2, 8 ∈/ Z?.
2.1.2 Naked Triplet
This rule is analogous to the Naked Pair rule (sec
tion 2.1.1), but instead it involves three cells in
stead of two. Let A be some grouping (row, col
umn or block), and let X, Y, Z ∈ A such that
the candidates for X, Y and Z are drawn from
{t, u, v}. Then, by exhaustion, there is a one-to
one set of assignments from {X, Y, Z} to {t, u, v}.
Therefore, no other cell in A may map to a value
in {t, u, v}.
An example of this is given in Figure 6. Here,
we have marked the cells {X, Y, Z} accordingly
and consider only block 8. In this puzzle, X? =
{3, 7}, Y ? = {1, 3, 7} and Z? = {1, 3}. Therefore,
we must assign 1, 3 and 7 to these cells, and may
remove them from the candidates for those cells
marked with an asterisk.