I like puzzles, and I like the puzzle of writing code to generate puzzles. Binairos are something I’d seen recently, and seemed like a simple enough starting point.

Binairos are grid puzzles with a series of X’s and O’s, and blanks. Your goal is to fill in the empty spaces. Let’s see how I’d go about writing a program that generates these puzzles.

### The Rules

This is a Binairo, a type of structured constraint logic puzzle.

To complete the grid, you need to follow these rules:

- No more than two X’s or two O’s can be in sequence in a row (horizontal) or column (vertical)
- There are an equal number of X’s and O’s in each row and column (so 4 of each in this grid)

The original version of this game adds one more rule: no rows or columns can be identical. I dropped this rule as I find it’d hard for players to track easily on larger grids, leading to more work than logical deduction. However, to solve puzzles printed in books I made a variant of my program that adds the rule. It’s my goal to write flexible code for a variety of grid puzzles.

My code produced the, and I verified it is solvable with no need to guess — we’ll get back to relevance of this later.

### Generation Approach

Generating these puzzles, the entire class of them, including things like Sudoku, is harder than solving them. Where hard refers to the number of calculations our code will do.

The approach I took has two steps:

- Create a filled in board
- Remove letters to produce a puzzle

#### Filling

Before we can fill a board, we need to represent the board. I did this as a flat vector in memory, which I used as a two-dimensional matrix. I mapped the X’s and O’s to numbers, where `X=1`

and `O=2`

. I represent an empty cell with a `0`

.

One of the original versions of this game, Takuzu, uses 1’s and 0’s. I feel as though this would be confusing to have

`0=1`

and`1=2`

, thus I’ll stick with X’s and O’s here, to be clear about what value-space we’re in.

I start with an empty board: I fill the vector with zeros.

From here I simply need to fill in the board with all combinations of numbers and check if the result is valid, stopping when I find one that is.

There are, however, 2^64 possible 8×8 boards, so enumerating them all won’t work. I instead immediately started with a tree-trimming algorithm. I can describe this in linear terms, the same way I store the values in memory. We can enumerate the boards by creating longer and longer lists of numbers.

For example, with just an X in the top-left we have the string `1`

, since the rest of the board is blank. We can then put a X or O in the next cell, meaning our sequence is `1 1`

or `1 0`

. Then, if the third cell is an X, we have either `1 1 1`

or `1 0 1`

. Each additional cell extends the sequence.

We can visualize this as a tree like below.

What we want is a way to trim entire branches from the search. For example, the sequence `1 1 1`

isn’t valid, since it has three X’s in a row. Thus, no board that starts with that sequence can be valid. There’s no point in enumerating further.

This decides how we write the validator. It needs to work on an incomplete board, thus it can’t tell me if something is valid, only if it is invalid. This is a typical trait of constraint logic puzzles. If an arrangement is invalid, all extensions of that arrangement are also invalid.

Given this invalid check, I wrote a basic brute-forcing algorithm, but cut off entire branches of the tree the moment I reached an invalid state. I could get the below board with only 4911 iterations, rather than 2^64.

But you notice a problem with this board? It’s producing a rather uninteresting symmetry. Also, each time I run the algorithm, it produces the same result.

There are two ways to get randomized results instead:

- at each cell, randomize the order in which the values are checked, instead of always checking 1 then 2
- randomize the order of the cells

Now I already had reason to believe that I’d need to control which cells I fill with my code. So, I went for the second approach. Instead of iterating the cells top-left through to bottom-right, I randomly walked through the board. In code I do this by adding all the cell positions to a vector, shuffling the vector, then iterating over the vector.

The rest of the logic stays the same. Indeed, I could simulate my early code for this article by simply removing the shuffling.

This result has less symmetry — though given it’s only X’s and O’s, we’re bound to still see patterns.

Not only does this give me a different result, it gives me a basis to produce endlessly different boards. All I need to do is change the random seed, thus changing the iteration order.

Curiously, the above board only required 159 iterations. I got lucky with the seed value and could trim large portions off the search-space quickly. I changed the seed and got a new board. This time it took 1.4 million iterations. One more change and up to 11 million. Trying a few more and I hit a tough one, at 250 million iterations!

This is the nature of randomization and tree-pruning algorithms. While 250 million is still far better than 2^64, it’s quite slow. What happens if I try to generate a larger board? I sometimes got lucky with 10×10, or even 12×12, but more often than not my computer fans spun up to max to deal with endless CPU load. This approach doesn’t scale, but it works for common board sizes.

### Validator

When I say something took 159 iterations, what am I measuring? In complexity theory, it’s helpful to focus on the most expensive operation. For example, when sorting a list, we focus on moving an item in the list, as that is the dominant cost. For board generation, validating the partial board is by far the most expensive part of the code. Therefore, my iteration counts are precisely the number of times I called the validator.

The rules of Binairo are simple, but there’s no avoiding loops of loops in the validator. This is what my code looks like.

bool is_invalid() const { auto check_linear = [&]<bool t>() { auto major = t ? height: width; auto minor = t ? width: height; for( int y=0; y < major; ++y ) { int recent_1 = 0; int recent_2 = 0; int total_1 = 0; int total_2 = 0; for( int x=0; x < minor; ++x ) { auto index = t ? cell_index(x,y) : cell_index(y, x); auto val = cells[index]; if( val == 1 ) { if( ++recent_1 > 2) { return true; } recent_2 = 0; total_1++; } else if( val == 2 ) { if( ++recent_2 > 2) { return true; } recent_1 = 0; total_2++; } else { recent_1 = 0; recent_2 = 0; } } if( total_1 > minor/2 || total_2 > minor/2 ) { return true; } } return false; }; return check_linear.operator()<true>() || check_linear.operator()<false>(); }

As the rules about rows are the same as those for columns, I didn’t want to write nearly the same code twice. I abstracted into a major/minor axis approach. If `t=true`

then I’m checking if the rows are valid, and when `t=false`

checking if the columns are valid. Look at the line where I set index, `auto index = t ? cell_index(x,y) : cell_index(y, x);`

; I’m transposing the values.

I’ve used a C++ template for this to tell that compiler that the conditional assignment to `index`

isn’t really a conditional. Branching on memory access within a deep loop is bad for performance. Knowing that C++ can express this is part of why I’m writing this code in the language.

The rest of the code checks the rules of the game. The recent values are checking the number of times the same value has appeared in sequence. If it reaches 3, then the board is invalid, as either an X or an O came three times ina row. Note that 0 values, an empty cell, reset the count. We’re checking if the board is definitely invalid, strictly according to the rules of the game. It’s quite possible that this board can’t be solved, yet is still presently valid.

These are tight loops that access little memory, yet they are still loops. They take time to process, and I’ll be executing these billions of times.

The number of operations in this function is linear with the size of the board. In larger boards this function is slower. However, in larger boards, the number of iterations is still the dominant problem. From a 8×8 to a 16×16 board, the validator takes four times as long. Yet the number of possible boards increases from 2^64 to 2^256.

#### Emptying

Now that I have a board, I need to produce a puzzle from it. I do this by attempting to remove cells from the board and see if it’s still solvable, and has a unique solution. If the solution isn’t unique, it’s an indicator that constraint deduction can’t be used to solve it. That is, there is at least one opportunity where a player could use either value and get an answer, or cannot determine which value to use.

Like the filling, I’m going to iterate over every cell in the board and make an attempt:

- Remove the cell at the current position
- Count number of solutions
- If exactly 1, then keep the cell cleared
- Otherwise, restore the cell and continue

As you can see, this leads to a bias in the emptied cells. The chance of a cell being removable is higher the earlier it comes in the check sequence. As blanks appear, the ability to remove further cells drops.

I fixed this the same way as the initial filling patterns: create a vector of all positions, randomize it, then follow the random sequence.

##### Counting solutions

How do I count the number of solutions? This is very similar to filling the board, indeed I reuse my fill algorithm, but with some template tweaks. Unlike filling, I always allow back-tracking to find other boards.

I don’t need to count all solutions, all I need to do is find out if there are 0, 1, or 2+ solutions. The moment a second solution is found, I can stop searching.

Though it makes no difference to this version of the algorithm, removing a cell will never make the game unsolvable. If you start with a valid board, it’s not possible to remove a cell and create a board that has no solutions. I used this to improve future versions of my code.

This counting is slow. The more empty spaces there are on the board, the longer it takes. Since I’m using the same fill code, I’m basically trying to brute-force find solutions for every cell I remove. It does not scale to large grids. Though I got lucky and found some 14×14 boards with needing only tens of minutes to execute.

In practice, the boards won’t get much bigger than 14×14, though I have seen some 20×20 boards. For the human player, as the board size increases, it becomes more of a test of working memory space than a logical deduction puzzle. But hey, I enjoy the challenge and want my code to make 50×50 puzzles!

##### It’s too hard though

The biggest problem with this approach is that while the resulting boards have solutions, a player won’t be able to find those solutions without guessing and back-tracking. The removal algorithm does not care about the difficulty of removing the cells.

We played some of the generated puzzles, and some worked, but often we got stuck. A section of the board would remain blank and we had no way to progress.

This led me to an alternate way to remove cells, one that considered the depth of thought, rather than just having a unique solution. I inverted the logic. For any given cell, I replace the value with the other value: I replace X with O, and O with X. Then I checked that this value could only result in invalid boards: using the fill algorithm, from this starting point, validate that no boards could be found.

Other than being slightly faster, this doesn’t get me closer to my goal. It still results in the same boards. However, this approach gives me a lever to pull: depth.

The fill algorithm is recursive. For each cell it tests it increases the stack depth. Instead of searching for an entirely filled board, I instead restrict it to a fixed depth size. If all paths to this depth were invalid, then I knew I could safely remove the cell. It meant that a player would only have to think a fixed number of moves ahead to eliminate the alternate value.

Take a pause, and try to consider why I think a fixed number of moves ahead is the same as a player using logical deduction. This is probably controversial, and I’ve not fully convinced myself. But when looking at the techniques one uses to solve this game, or similar games, including Sudoku, I reduced the techniques to a number of lookahead moves. This lined up roughly with what was the considered difficulty of each of those — the more advanced techniques used more lookaheads. I don’t know if there is a clear cut-off between deduction and lookahead.

I ran this new version and produced boards that were always easy to solve. Way too easy to solve. I didn’t remove enough cells from the board. The problem was the locality of the missing bits. If there are a lot of empty cells, then the fill algorithm may not fill in cells near the one I removed. Thus I couldn’t tell if it was invalid.

So instead, with a wondrous level of hand-waving logic, I reckoned that if the wrong value simply had more invalid boards than the right value, then it was enough to remove the cell. I’m kind of surprised this worked. I thought about it several more times to convince myself it made sense, but I still have my doubts.

But it worked. I could produce Binairos that were always solvable, though easy. I’ll need something more.

### Moving on

I was happy to get this far. Rather than play on the algorithm anymore, I built a nice interface to the program, with command-line arguments to save partial states, and load, so I could focus my testing. I also built a solve interface that would use the same code to solve puzzles I found in books. These required adding the aforementioned unique rows and columns constraint, otherwise I couldn’t find unique solutions — in most cases I couldn’t find any solution at all as the search space was too large.

My approach has two fundamental problems:

- It doesn’t scale, and is already slow on some 10×10 boards
- It mostly produces easy puzzles

But it is generic. I can apply it to any grid-like game. The rules of the game are encoded only in the validator: plug-in new rules, and new games come out. I made my code work with a range of values, not simply X and O. The code loops from min…max, where those are 1 and 2 now, but I can extend the range.

Future me has already worked on the fundamental problems, and it required breaking the generic. I need game-specific rules. While I produce better puzzles, I still need to tweak the code. Once I’m satisfied it’s good, I’ll write a followup with the refined approach.