We consider the problem of perception-based constraint solving, where part of the problem specification is provided indirectly through an image provided by a user. As a pedagogical example, we use the complete image of a Sudoku grid. While the rules of the puzzle are assumed to be known, the image must be interpreted by a neural network to extract the values in the grid. In this paper, we investigate (1) a hybrid modeling approach combining machine learning and constraint solving for joint inference, knowing that blank cells need to be both predicted as being blank and filled-in to obtain a full solution; (2) the effect of classifier calibration on joint inference; and (3) how to deal with cases where the constraints of the reasoning system are not satisfied. More specifically, in the case of handwritten user errors in the image, a naive approach fails to obtain a feasible solution even if the interpretation is correct. Our framework identifies human mistakes by using a constraint solver and helps the user to correct these mistakes. We evaluate the performance of the proposed techniques on images taken through the Sudoku Assistant Android app, among other datasets. Our experiments show that (1) joint inference can correct classifier mistakes, (2) overall calibration improves the solution quality on all datasets, and (3) estimating and discriminating between user-written and original visual input while reasoning makes for a more robust system, even in the presence of user errors.
Keywords: Constraint programming; Joint inference; Machine learning; Visual sudoku.
© The Author(s) 2024.