We recently released the Berkeley Crossword Solver (BCS), the current state of the art in solving American-style crossword puzzles. BCS combines neural question answering and probabilistic inference to achieve near-perfect performance on most American-style crosswords, such as the one below:
Figure 1: Example of an American-style crossword puzzle
An earlier version of BCS, along with Dr.Fill, was the first computer program to beat all human competitors in the world’s best crossword tournament. The latest version is The New York Times’ current best-performing crossword system, achieving 99.7% letter accuracy (see white paper, web demo, and code release).
Crosswords are a challenge for both humans and computers. Many clues are vague or poorly specified and cannot be answered until the crossing restrictions are taken into account. While some clues are similar to answering factoid questions, others require relational reasoning or an understanding of difficult puns.
Here are a handful of example tracks from our dataset (answers at the bottom of this post):
- They are given at the HAAS School in Berkeley (4)
- winter hours in Berkeley (3)
- Domain ender that UC Berkeley was one of the first schools to adopt (3)
- Angeleno in Berkeley, for example (8)
The BCS uses a two-step process to solve crosswords. First, it generates a probability distribution over the possible responses to each clue using a question-answering (QA) model; second, it uses probabilistic inference, combined with local search and a generative language model, to handle conflicts between proposed intersecting answers.
Figure 2: Architecture diagram of the Berkeley crossword solver
The BCS question-answering model is based on DPR (Karpukhin et al., 2020), which is a bi-coder model typically used to retrieve passages that are relevant to a given question. However, rather than passages, our approach maps questions and answers into a shared embedding space and finds answers directly. Compared to the previous state-of-the-art method for answering crossword clues, this approach achieved a 13.4% absolute improvement in top 1000 QA accuracy. We performed a manual error analysis and found that our quality control model typically performed well on questions related to knowledge, common sense reasoning, and definitions, but often struggled to understand puns or clues related to the subject.
After running the quality control model on each track, the BCS runs looping belief propagation to iteratively update the grid response probabilities. This allows information from high-confidence predictions to be propagated to more difficult tracks. After the belief propagation converges, the BCS obtains an initial puzzle solution by greedily taking the most likely answer at each position.
The BCS then refines this solution using a local search that attempts to replace low-confidence characters in the grid. Local search works using a guided proposal distribution in which the characters that had the lowest marginal probabilities during belief propagation are iteratively replaced until a locally optimal solution is found. We annotate these alternative characters using a character-level language model (ByT5, Xue et al., 2022), which handles novel responses better than our closed-book quality control model.
Figure 3: Example of changes made by our local search procedure
We evaluated the BCS on puzzles from five major crossword publishers, including The New York Times. Our system averages 99.7% letter accuracy, which jumps to 99.9% if you ignore puzzles involving rare themes. It solves 81.7% of puzzles without a single error, which is a 24.8% improvement over the previous state-of-the-art system.
Figure 4: Results compared with the prior art Dr.Fill
The American Crossword Puzzle Tournament (ACPT) is the largest and longest running crossword puzzle tournament and is organized by Will Shortz, the crossword editor of the New York Times. Two earlier approaches to solving computer crosswords gained mainstream attention and competed in the ACPT: Proverb and Dr.Fill. Proverb is a 1998 system that was ranked 213 out of 252 competitors in the tournament. Dr.Fill’s first competition was at ACPT 2012 and he placed 141st out of 650 competitors. We teamed up with Dr.Fill creator Matt Ginsberg and combined an early version of our quality control system with Dr.Fill’s search procedure to beat 1033 human competitors at ACPT 2021. Our joint presentation solved all seven puzzles in less than a minute, and only three letters were missing in two puzzles.
Figure 5: Results of the 2021 American Crossword Tournament (ACPT)
We are very excited about the challenges that remain in crosswords, such as handling difficult themes and more complex word games. To encourage future work, we are releasing a dataset of 6.4 million question answer clues, a demo of the Berkeley Crossword Solver, and our code at http://berkeleycrosswordsolver.com.
Answers to clues: MBAS, PST, EDU, INSTATER
Ikaroa is proud to present an overview of the Berkeley Crossword Solver – a unique artificial intelligence research blog created by Berkeley AI researchers. At Berkeley AI, the team is dedicated to advancing the state of the art in AI research and development. The Berkeley Crossword Solver blog showcases their latest projects, engaging conversations with PhD students, and cutting-edge concepts.
The blog provides readers with a unique window into the daily research of the Berkeley AI team. Each post dives into a specialized area of AI research, discussing renowned academic papers and the implications of their findings. The blog also covers AI-related events and news from around the world.
In addition to exploration and education, the Berkeley Crossword Solver blog is home to creative solutions for some of the world’s toughest crossword puzzles. AI is used to build efficient algorithms for solving these puzzles, and the blog provides readers with full walkthroughs of the solutions developed by the AI researchers.
Ikaroa’s own team of experts is collaborating with the Berkeley AI team to develop the next generation of AI-powered crossword puzzles. As part of this project, we have been able to use our expertise in deep learning and natural language processing to make the crossword solving algorithms even more effective. As AI research continues to make leaps and bounds, we are proud to be a part of the solution to further improve the power of AI technology.
We’re excited to follow the progress of the Berkeley Crossword Solver blog as the Berkeley AI team helps to shape the future of artificial intelligence research.