Teaching LLMs String Matching, Backtracking, and Error Recovery to Deduce Bases and Truth Tables for the Combinatorially Exploding Bit Manipulation Puzzles

Aditya Prasad; Prabhat Agnihotri; Prateek Agnihotri; Sanchit Jain; Shubham Jain

arxiv: 2606.23672 · v2 · pith:NPSBR5CXnew · submitted 2026-06-22 · 💻 cs.AI

Teaching LLMs String Matching, Backtracking, and Error Recovery to Deduce Bases and Truth Tables for the Combinatorially Exploding Bit Manipulation Puzzles

Prateek Agnihotri , Sanchit Jain , Prabhat Agnihotri , Aditya Prasad , Shubham Jain This is my paper

Pith reviewed 2026-07-01 06:34 UTC · model grok-4.3

classification 💻 cs.AI

keywords bit manipulation puzzlesstring similaritybacktracking searchtruth table deductionLLM reasoningerror recoverybase selection

0 comments

The pith

String similarity via minimal bit flips lets LLMs isolate primitive bases and deduce truth tables for hidden bit rules without simulating operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that reframing bit-manipulation rule discovery as a base-selection problem, solved by measuring minimal bit flips between input and output strings, allows LLMs to identify the primitive transformations and build the corresponding truth tables. A reader would care because this bypasses the combinatorial explosion of enumerating shifts, rotations, and gates that normally causes hallucinations when LLMs attempt direct boolean simulation. The method pairs the similarity search with a backtracking DFS that detects logical collisions across examples and recovers by retracting failed candidates. Special bit-level tokenization and dynamic masking during supervised fine-tuning further let the model internalize hypothesis, evaluation, and correction steps.

Core claim

By treating the hidden rule as a selection among candidate bases (primitive string transformations) whose effects are compared to observed input-output pairs via Hamming distance, the approach deduces the minimal set of bases and assembles the truth table that reproduces all given examples; backtracking DFS then ensures consistency by retracting any base that produces collisions on later examples.

What carries the argument

Base-selection task that isolates primitive transformations by minimal bit-flip distance and assembles them into a truth table without enumerating boolean operations.

If this is right

The method reaches over 96 percent validation accuracy on the contest's bit-manipulation puzzles.
Backtracking on detected logical collisions supplies autonomous error recovery without external supervision.
Bit-tokenization plus dynamic masking trains the model to generate, evaluate, and retract hypotheses in a single forward pass.
The same string-similarity reduction applies whenever the search space of arithmetic or logical operations grows combinatorially.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same minimal-distance base selection could be tested on non-bit string-rewriting tasks such as simple program synthesis or cellular-automaton rule inference.
If the assumption holds, explicit logical simulators become unnecessary for many pattern-based reasoning benchmarks, shifting emphasis to distance metrics and search control.
The backtracking component might be replaced by learned policies that predict collision likelihood directly from partial base sets.

Load-bearing premise

Measuring similarity only by the number of bit flips between strings is enough to identify which primitive transformations generate the hidden rule.

What would settle it

A collection of bit-manipulation puzzles in which two distinct sets of bases produce identical minimal-flip distances on all training examples but diverge on held-out inputs, causing the method to select the wrong base set.

Figures

Figures reproduced from arXiv: 2606.23672 by Aditya Prasad, Prabhat Agnihotri, Prateek Agnihotri, Sanchit Jain, Shubham Jain.

**Figure 1.** Figure 1: Spatial mapping of input bits to form the bases for target output Bit 6 (red) across the three base classes: Right Shifts (1), Circular Shifts (2), and Left Shifts (3). The curved arrows trace how each individual offset (e.g., R1, C2, or L3) retrieves its value from a specific position in the input string or from padded zeros (dashed). Our approach begins by deconstructing the sequence-to-sequence translat… view at source ↗

**Figure 2.** Figure 2: Isolating output-influencing bases by comparing rows with different target outputs. When two nearly identical 22-base arrays yield different outputs (0 vs. 1), the differing bases must be responsible for the state change. flipped bases, or a combination of them. This trace acts as an explicit logical OR constraint. To ensure a discovered logical rule is universally true, it must satisfy every single Flip T… view at source ↗

**Figure 3.** Figure 3: End-to-End Flowchart of the Bit Manipulation Solver. The algorithm isolates logical constraints via string matching, iteratively searches for valid bases via backtracking, and ultimately synthesizes the final truth table to solve the puzzle. Algorithm 1: End-to-End Bit Manipulation Solver Input: Puzzle Examples, Unseen Target String Output: 8-bit Target Prediction Phase 1: Feature Extraction & String Matc… view at source ↗

**Figure 4.** Figure 4: Visualizing the Phase 2 Backtracking Search using the traces from Box 2. The algorithm proposes a candidate subset of bases and tests it globally. On Branch 1, an empirical truth table reveals a logical contradiction (inputs 1,0 yielding conflicting outputs), forcing the algorithm to deterministically backtrack to Branch 2. Box 3: The Backtracking Execution Following the traces from Box 2, here is how the … view at source ↗

read the original abstract

This paper presents our algorithmic innovations for the NVIDIA Nemotron Model Reasoning Challenge, focusing on Bit Manipulation Puzzles. In this task, the objective is to discover a hidden logical rule transforming input binary strings to outputs, then apply it to unseen inputs. Large Language Models (LLMs) notoriously struggle here; traditional methods force them to simulate complex boolean logic and arithmetic, leading to hallucinations. Furthermore, the search space of bitwise operations (combinations of shifts, rotations, and logic gates) suffers from a severe combinatorial explosion. To overcome this computational intractability, we present a novel approach that abandons arithmetic logic entirely in favor of string similarity, structured search, and autonomous error recovery. Our core contributions are: 1. Bases and Truth Table Formulation: We reframe logic-gate deduction into a base-selection task, leveraging string similarity (minimal bit flips) to isolate primitive transformations ("bases") and deduce truth tables without complex arithmetic. 2. Backtracking DFS and Error Recovery: We formalize a search process that tests candidate bases, detects logical collisions across examples, and backtracks upon failure to perform robust error recovery. 3. Bit Tokenization and Interactive Reasoning SFT: We force the tokenizer to encode binary strings as individual single-bit tokens. We use dynamic masking to simulate external oracle feedback, training the model to hypothesize, self-evaluate, and backtrack natively. Evaluated on bit manipulation puzzles, our approach achieved over 96% validation accuracy. This represents the highest performance in this category, driving our 7th Place overall finish in the contest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a contest engineering paper that swaps boolean simulation for bit-flip base selection plus backtracking and gets 96% on the puzzles, but the uniqueness assumption is untested and the writeup is thin on evidence.

read the letter

The main takeaway is that the authors reframed bit manipulation puzzles as a base-selection problem solved by minimal bit-flip distance, then added backtracking DFS for collision detection and error recovery, plus bit-level tokenization and dynamic-masking SFT to train the model to hypothesize and backtrack. That combination produced over 96% validation accuracy and 7th place in the Nemotron challenge.

What they did well is recognize the combinatorial blowup in bitwise operations and sidestep it with a string-similarity search that the LLM can learn to manage. The dynamic masking to simulate oracle feedback during fine-tuning is a practical detail that could be reused in other search-augmented reasoning setups.

The soft spots are in the missing checks. The approach assumes that the true bases are the unique or correctly recoverable minimizers of flip distance across examples, yet nothing in the abstract shows they tested whether other operation sets could produce identical distances on the given cases while diverging on new inputs. Without ablations, error breakdowns, or even basic tables, it is impossible to tell how often that assumption held or whether the 96% is robust. The stress-test concern about non-unique bases lands because the paper cites the combinatorial explosion but does not address collisions.

This is for readers who work on LLM contest entries or practical tweaks for constrained puzzle tasks. It does not claim new theory and will not generalize far, but the implementation choices are concrete enough that someone replicating the setup might borrow the tokenization or masking trick.

I would send it to peer review if the full version supplies the experimental details and addresses the uniqueness question; otherwise it stays a narrow case study.

Referee Report

2 major / 1 minor

Summary. The paper presents algorithmic innovations for the NVIDIA Nemotron Model Reasoning Challenge on Bit Manipulation Puzzles. It reframes deduction of hidden logical rules as a base-selection task that uses string similarity (minimal bit flips) to isolate primitive transformations ('bases') and deduce truth tables without simulating boolean operations. The method adds backtracking DFS with logical-collision detection for error recovery, plus bit-level tokenization and interactive-reasoning SFT that trains the LLM to hypothesize, self-evaluate, and backtrack. The approach is reported to reach over 96% validation accuracy—the highest in its category—and 7th place overall.

Significance. If the empirical result holds under the stated assumptions, the work would demonstrate a practical route around the combinatorial explosion of bitwise-operation search by substituting distance-based base selection and autonomous backtracking for direct arithmetic simulation. The contest placement supplies external corroboration that the pipeline can be deployed successfully on the target task distribution.

major comments (2)

[Abstract] Abstract: the central claim that 'minimal bit flips' isolate the true primitive transformations without recovering the underlying boolean operations is load-bearing for the 96% accuracy result, yet the manuscript supplies neither a formal argument nor any concrete example showing that the distance metric yields a unique (or correctly recoverable) minimizer; the possibility that distinct operation sets induce identical flip distances on the observed examples but diverge on unseen inputs is not addressed.
[Abstract] Abstract: the reported 'over 96% validation accuracy' is presented with no data tables, ablation studies, error analysis, or description of the validation split, so it is impossible to determine whether the accuracy actually supports the claim that the string-similarity-plus-backtracking method overcomes the combinatorial explosion and LLM hallucinations.

minor comments (1)

[Abstract] Abstract: the phrase 'dynamic masking to simulate external oracle feedback' is introduced without any description of the masking schedule, loss formulation, or how oracle signals are generated during training.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions planned for the next version.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'minimal bit flips' isolate the true primitive transformations without recovering the underlying boolean operations is load-bearing for the 96% accuracy result, yet the manuscript supplies neither a formal argument nor any concrete example showing that the distance metric yields a unique (or correctly recoverable) minimizer; the possibility that distinct operation sets induce identical flip distances on the observed examples but diverge on unseen inputs is not addressed.

Authors: We agree that the manuscript lacks both a formal uniqueness argument and a worked example for the minimal bit-flip distance. The approach is empirical: in the contest distribution the true bases are the minimal-distance matches, and backtracking resolves collisions by enforcing consistency across examples. We will add a concrete numerical example (with input/output pairs, distance calculations, and backtrack steps) to Section 3. A general formal proof of uniqueness does not hold for arbitrary operation sets, so we will instead state the empirical conditions under which the metric succeeds on the target task. revision: partial
Referee: [Abstract] Abstract: the reported 'over 96% validation accuracy' is presented with no data tables, ablation studies, error analysis, or description of the validation split, so it is impossible to determine whether the accuracy actually supports the claim that the string-similarity-plus-backtracking method overcomes the combinatorial explosion and LLM hallucinations.

Authors: We acknowledge that the current presentation of results is insufficiently detailed. The full manuscript contains a validation-split description, but we will expand the evaluation section with: (1) a results table broken down by puzzle type, (2) ablation tables isolating backtracking and bit-level tokenization, (3) error analysis of the remaining failure cases, and (4) explicit confirmation that the 96% figure is measured on held-out contest examples. These additions will directly link the reported accuracy to the claimed mitigation of combinatorial search and hallucinations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with independent validation

full rationale

The paper presents an algorithmic approach using string similarity via minimal bit flips for base selection, backtracking search, and LLM fine-tuning with bit tokenization. It reports an empirical validation accuracy of over 96% on contest puzzles. No derivation chain, equations, or self-citations are shown that reduce the claimed result to a fitted parameter, self-defined quantity, or prior author result by construction. The central performance number is an external benchmark outcome, not a statistical artifact of the method's inputs. The approach is self-contained against the contest data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5852 in / 1054 out tokens · 21496 ms · 2026-07-01T06:34:23.422640+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

14 extracted references

[1]

Semaan, Jean-Francois Puget, Christof Henkel, Yi Dong, Addison Howard, Ashley Oldacre, Ryan Holbrook, Chris Alexiuk, and Rebecca Kao

Jamil C. Semaan, Jean-Francois Puget, Christof Henkel, Yi Dong, Addison Howard, Ashley Oldacre, Ryan Holbrook, Chris Alexiuk, and Rebecca Kao. NVIDIA Nemotron Model Reasoning Challenge, 2026. https://kaggle.com/competitions/ nvidia-nemotron-model-reasoning-challenge. Kaggle

2026
[2]

NVIDIA Nemotron Model Reasoning Chal- lenge — Writeup, 2026

Tong Hui Kang. NVIDIA Nemotron Model Reasoning Chal- lenge — Writeup, 2026. https://www.kaggle.com/competitions/ nvidia-nemotron-model-reasoning-challenge/discussion/689915 . Kaggle Discussion. 15 Appendix A: Full Chain-of-Thought (CoT) Traces This appendix provides two complete, unedited examples of the synthetic Chain-of- Thought (CoT) traces used to tr...

2026
[3]

1 10 01 10 1 -> 00 000 00 0
[4]

1 00 10 11 1 -> 00 000 00 0
[5]

1 10 00 11 1 -> 00 000 00 0
[6]

1 10 10 11 1 -> 00 000 01 0
[7]

1 11 10 01 1 -> 00 000 11 0
[8]

1 11 01 11 1 -> 00 000 10 1
[9]

Target O =0

1 01 00 10 0 -> 00 000 10 0 Row Cr ea ti on Trace : Input : 11 00 110 1 -> Target Output : 00 00 00 00 Bit 7 ( MSB ) : O ri gi nal bit x =1. Target O =0. R [1 -7]: Shift input right by 1 -7 , extract Bit 7. Result : 0000000 C [1 -7]: Rotate input left by 1 -7 , extract Bit 7. Result : 1001101 L [1 -7]: Shift input left by 1 -7 , extract Bit 7. Result : 10...
[11]

** Greedy B r a n c h i n g :** For r e m a i n i n g traces , i t e r a t i v e l y select the feature with the highest i n t e r s e c t i o n f r e q u e n c y
[12]

If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid

** C o l l i s i o n V e r i f i c a t i o n & B a c k t r a c k i n g :** When all traces are covered , ge ne ra te a Truth Table across all 64 rows . If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid . We b a c k t r a c k and try the next c a n d i d a t e . < deduction > Bases : {} 17 U n c o v e ...
[13]

** M a n d a t o r y Bases :** Any feature in a 1 - bit flip trace is an ab sol ut e c o n s t r a i n t and is i m m e d i a t e l y locked
[14]

** Greedy B r a n c h i n g :** For r e m a i n i n g traces , i t e r a t i v e l y select the feature with the highest i n t e r s e c t i o n f r e q u e n c y . 19
[15]

If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid

** C o l l i s i o n V e r i f i c a t i o n & B a c k t r a c k i n g :** When all traces are covered , ge ne ra te a Truth Table across all 64 rows . If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid . We b a c k t r a c k and try the next c a n d i d a t e . < deduction > Bases : {} U n c o v e r e...

[1] [1]

Semaan, Jean-Francois Puget, Christof Henkel, Yi Dong, Addison Howard, Ashley Oldacre, Ryan Holbrook, Chris Alexiuk, and Rebecca Kao

Jamil C. Semaan, Jean-Francois Puget, Christof Henkel, Yi Dong, Addison Howard, Ashley Oldacre, Ryan Holbrook, Chris Alexiuk, and Rebecca Kao. NVIDIA Nemotron Model Reasoning Challenge, 2026. https://kaggle.com/competitions/ nvidia-nemotron-model-reasoning-challenge. Kaggle

2026

[2] [2]

NVIDIA Nemotron Model Reasoning Chal- lenge — Writeup, 2026

Tong Hui Kang. NVIDIA Nemotron Model Reasoning Chal- lenge — Writeup, 2026. https://www.kaggle.com/competitions/ nvidia-nemotron-model-reasoning-challenge/discussion/689915 . Kaggle Discussion. 15 Appendix A: Full Chain-of-Thought (CoT) Traces This appendix provides two complete, unedited examples of the synthetic Chain-of- Thought (CoT) traces used to tr...

2026

[3] [3]

1 10 01 10 1 -> 00 000 00 0

[4] [4]

1 00 10 11 1 -> 00 000 00 0

[5] [5]

1 10 00 11 1 -> 00 000 00 0

[6] [6]

1 10 10 11 1 -> 00 000 01 0

[7] [7]

1 11 10 01 1 -> 00 000 11 0

[8] [8]

1 11 01 11 1 -> 00 000 10 1

[9] [9]

Target O =0

1 01 00 10 0 -> 00 000 10 0 Row Cr ea ti on Trace : Input : 11 00 110 1 -> Target Output : 00 00 00 00 Bit 7 ( MSB ) : O ri gi nal bit x =1. Target O =0. R [1 -7]: Shift input right by 1 -7 , extract Bit 7. Result : 0000000 C [1 -7]: Rotate input left by 1 -7 , extract Bit 7. Result : 1001101 L [1 -7]: Shift input left by 1 -7 , extract Bit 7. Result : 10...

[10] [11]

** Greedy B r a n c h i n g :** For r e m a i n i n g traces , i t e r a t i v e l y select the feature with the highest i n t e r s e c t i o n f r e q u e n c y

[11] [12]

If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid

** C o l l i s i o n V e r i f i c a t i o n & B a c k t r a c k i n g :** When all traces are covered , ge ne ra te a Truth Table across all 64 rows . If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid . We b a c k t r a c k and try the next c a n d i d a t e . < deduction > Bases : {} 17 U n c o v e ...

[12] [13]

** M a n d a t o r y Bases :** Any feature in a 1 - bit flip trace is an ab sol ut e c o n s t r a i n t and is i m m e d i a t e l y locked

[13] [14]

** Greedy B r a n c h i n g :** For r e m a i n i n g traces , i t e r a t i v e l y select the feature with the highest i n t e r s e c t i o n f r e q u e n c y . 19

[14] [15]

If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid

** C o l l i s i o n V e r i f i c a t i o n & B a c k t r a c k i n g :** When all traces are covered , ge ne ra te a Truth Table across all 64 rows . If i d e n t i c a l inputs yield d i f f e r e n t outputs ( a c o l l i s i o n ) , the set is invalid . We b a c k t r a c k and try the next c a n d i d a t e . < deduction > Bases : {} U n c o v e r e...