Sudoku: Puzzle Generation

2020, Aug 13
Link to project here.

Aside from a solving the puzzles, I also wanted the to supply puzzles. It turns out that generating graded puzzles (as in grading them Easy, Medium or Hard) is actually not a straight forward process as explained by Gareth Rees and Andrew Stuart in their very informative articles. Thus, I just decided to make a way to randomly generate a puzzle without worrying about the difficulty grading as ultimately, what's hard for a computer is different for a human. The upside to this is I will be able to generate puzzles on the fly rather than having to pre-generate and store them in a database.

Generating Puzzles


When creating a puzzle for Sudoku, you actually need to start with a solution and then work your way back. So really, I need a random solution generator.

I can't place random values into random positions on the puzzle as that would be incredibly inefficient, and I can't start from a blank puzzle because DLX is a deterministic solver and it will always come up with the same solutions in the same order. Instead, I need to start with what I call a seed state, which is similar to what the two experts did.

Seed Generation

To generate a seed state, I just make an empty state, pick from a list of valid value without replacement and then place them into a random location in the puzzle. This ensures that there will be no clashing values. I had originally written this in Java using a different format to represent the Sudoku state, but thankfully I could reuse most of the code as is. I also found a functional way of converting a 1D array to a 2D array.

const mkSeed = size => {
  const gridStr = fnGrid.VALID_SYMBOLS
    .slice(0, size)
    .padEnd(size*size, fnGrid.EMPTY_SYMBOL2)

  return fnArr.shuffle(Immutable.List(gridStr.split("")))
    // Reduce a 1D array to 2D array
    .reduce((rows, value, index) => (index % size == 0 ? rows.push([value]) 
      : rows[rows.length-1].push(value)) && rows, [])
    .map(row => row.join(""))
    .reduce((str, row) => str + "\n" + row)
}

Solving the Seed

After a seed state is generated, it has to be solved. I found that this method of seed generation does not actually guarantee a solvable state. For example, the state for size 4 puzzle "0100020030000000" is technically valid while still not being solvable. So the solution generated needs to just keep trying until it finds a solvable state. Fortunately, chains of these tend to be relatively rare and does not significantly impact the puzzle generation time.

const mkSolution = size => {
  while (true) {
    const grid = fnGrid.importString2(generator.mkSeed(size))
    const solved = dlx.solve(true)(grid)

    if (solved.get("isComplete")) return solved
  }
}

Reducing the Solution to a Puzzle

Once a solution is found, we just need to remove cells until we are satisfied with it. I've allowed the setting of number of empty cells as a primitive method of controlling difficulty. Ideally a Sudoku puzzle has only one solution, but I've also allowed for setting the maximum number of solutions because it could be useful for testing other solvers.

const mkPuzzle: (size, maxSolutions, maxEmptyCells, stepLimit=1000) => {
  const indices = fnArr.range(size)
  const positions = fnArr.shuffle(indices.flatMap(i => indices.map(j => [i, j]))).toJS()

  let best = generator.mkSolution(size)
  let nEmptyCells = 0

  for (let i=0; i<positions.length; i++) {
    const sol = best.withMutations(mutable => {
      mutable.setIn(["matrix", positions[i][0], positions[i][1]], fnGrid.EMPTY_SYMBOL)
      mutable.set("isComplete", false)
    })

    const data = dlx.search(sol, maxSolutions+1, stepLimit)
    if (data.get("state").solutions.length>0 && data.get("state").solutions.length <= maxSolutions) {
      best = sol
      nEmptyCells += 1
      if (nEmptyCells>=maxEmptyCells) return best
    }
  }

  return best
}

Once a shuffled list of positions on the board is generated, I just iterate through it, removing the number at each position and checking if the resulting puzzle still meets the given criteria.

The implementation above is very inefficient and slower than I was expecting. Most of the computer's time is actually spent recreating the Dancing Links Matrix (DLM) on each iteration even though they always start out the same. I initially tried to modify dancing-links.js to create a resettable DLM, however that seemed to cause some strange and undesirable behavior. So the task of making it more efficient will have to be handled by future me.