Tao's Equational Proof Challenge Accepted (Technical Report)
Pith reviewed 2026-05-21 01:34 UTC · model grok-4.3
The pith
New minimization tool reduces 62-step equational proof to 20 rewrite steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a combination of brute force and heuristics, and exploiting both a superposition prover and an equational prover, the tool reduces the 62-step proof to 20 steps, each corresponding to a rewrite. In an empirical evaluation, it also performs well on 1431 equational problems originating from the same project, reducing in particular a 151-step proof to only 10 steps.
What carries the argument
The proof minimization tool that combines brute-force search and heuristics to exploit two equational provers and generate shorter rewrite-based proofs.
If this is right
- The 62-step proof can be replaced by a much shorter sequence of 20 rewrites.
- Many equational problems admit significantly shorter proofs than those initially found by provers.
- Rewrite steps provide a simple and verifiable form for equational derivations.
- The approach scales to large numbers of similar problems.
Where Pith is reading between the lines
- This suggests that automated proof search often produces proofs with redundant steps that can be pruned.
- Shorter proofs could facilitate better human understanding of automated mathematical discoveries.
- The technique might generalize to minimization in other proof systems.
Load-bearing premise
The heuristics and brute-force search in the tool are assumed to be sufficient to discover near-minimal proofs without systematic omission of shorter alternatives.
What would settle it
An independent search or manual construction that produces a proof with fewer than 20 steps for the original equational statement would show that further reduction is possible.
read the original abstract
In the context of the Equational Theories Project, Terence Tao posed the challenge of finding alternatives to a complicated 62-step proof found by the Vampire superposition prover. We introduce a proof minimization tool called Krympa. Using a combination of brute force and heuristics, and exploiting both Vampire and the Twee equational prover, the tool reduces the 62-step proof to 20 steps, each corresponding to a rewrite. In an empirical evaluation, it also performs well on 1431 equational problems originating from the same project, reducing in particular a 151-step proof to only 10 steps.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Krympa, a proof-minimization tool that combines brute-force search and heuristics with the Vampire superposition prover and the Twee equational prover. It claims to reduce a 62-step Vampire proof for Terence Tao's equational challenge to a valid 20-step proof consisting entirely of rewrite steps, and reports analogous length reductions across 1431 equational problems drawn from the Equational Theories Project, including one reduction from 151 steps to 10 steps.
Significance. If the reported reductions are sound and independently verifiable, the work would demonstrate a practical approach to shortening equational proofs that could aid insight and automation in formal mathematics. The scale of the empirical evaluation on 1431 instances is a positive feature, as is the explicit reuse of established tools (Vampire and Twee) rather than a wholly new prover.
major comments (2)
- [Description of the Tao challenge result] The central claim that Krympa produces a valid 20-step rewrite proof for the Tao challenge is not accompanied by the explicit sequence of 20 rewrite steps or by a machine-checkable certificate (e.g., a Twee or Vampire proof object) that each step is a sound equational replacement. Without this artifact, the reported length reduction cannot be distinguished from an internal counting or validity error in the minimization pipeline.
- [Empirical evaluation section] The empirical evaluation reports successful reductions on 1431 problems but supplies neither error bars, a breakdown of verification method for each minimized proof, nor any independent replay data. This omission makes it impossible to assess the reliability of the claimed minimality or soundness across the test suite.
minor comments (2)
- [Tool description] The description of the heuristics and brute-force search strategy inside Krympa is high-level; a pseudocode outline or parameter settings would improve reproducibility.
- [Results presentation] Table or figure captions that list the original and minimized proof lengths for the 1431 instances would make the scale of the evaluation easier to assess at a glance.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment below, indicating where revisions will be made to strengthen verifiability while preserving the core contributions of the work.
read point-by-point responses
-
Referee: The central claim that Krympa produces a valid 20-step rewrite proof for the Tao challenge is not accompanied by the explicit sequence of 20 rewrite steps or by a machine-checkable certificate (e.g., a Twee or Vampire proof object) that each step is a sound equational replacement. Without this artifact, the reported length reduction cannot be distinguished from an internal counting or validity error in the minimization pipeline.
Authors: We agree that the absence of the explicit 20-step sequence and certificate limits independent verification. The minimization pipeline invokes Twee to generate and validate each rewrite step, inheriting soundness from Twee's equational reasoning. In the revised manuscript we will append the complete sequence of 20 rewrite steps for the Tao challenge together with the corresponding Twee proof object (or a link to a supplementary archive containing it). This addition will allow direct replay and confirmation that the reduction from the original 62-step Vampire proof is both valid and minimal. revision: yes
-
Referee: The empirical evaluation reports successful reductions on 1431 problems but supplies neither error bars, a breakdown of verification method for each minimized proof, nor any independent replay data. This omission makes it impossible to assess the reliability of the claimed minimality or soundness across the test suite.
Authors: We acknowledge that the current presentation does not detail the verification procedure or supply replay artifacts. All minimized proofs were obtained by invoking Twee on the output of Krympa's search; Twee then confirms that each step is a sound equational replacement and that the final proof derives the goal. In revision we will add a dedicated subsection describing this replay-based verification, report aggregate statistics (including reduction ratios and success rates), and note that the full set of minimized proofs and Twee certificates will be deposited in a public repository. Individual replay data for every one of the 1431 instances cannot be reproduced in the main text for space reasons, but the repository will enable independent checking. revision: partial
Circularity Check
Empirical tool outputs on external problems are self-contained with no circular reduction
full rationale
The manuscript describes an algorithmic tool (Krympa) that invokes Vampire and Twee for search and rewriting, then reports the lengths of the resulting proof objects on externally supplied equational problems. The 62-to-20-step reduction and the 151-to-10-step reduction are direct computational outputs rather than quantities derived by definition or by fitting parameters to the same data. No equation, uniqueness theorem, or ansatz is shown to be justified solely by a self-citation that itself depends on the target result. The derivation chain therefore consists of independent algorithmic steps whose correctness can be checked by re-running the tool or inspecting the produced rewrite sequences.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Vampire and Twee produce sound equational proofs
invented entities (1)
-
Krympa
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Amjad, H.: Compressing propositional refutations. In: Merz, S., Nipkow, T. (eds.) AVoCS 2006. Electronic Notes in Theoretical Computer Science, vol. 185, pp. 3–15. Elsevier (2006)
work page 2006
-
[2]
Bachmair, L., Dershowitz, N., Plaisted, D.A.: Completion without failure. In: Aït-Kaci, H., Nivat, M. (eds.) Rewriting Techniques, pp. 1–30. Academic Press (1989)
work page 1989
-
[3]
Bachmair, L., Ganzinger, H.: Strict basic superposition. In: Kirchner, C., Kirchner, H. (eds.) CADE 1998. LNCS, vol. 1421, pp. 160–174. Springer (1998)
work page 1998
-
[4]
Bártek, F., Bhayat, A., Coutelier, R., Hajdú, M., Hetzenberger, M., Hozzová, P., Kovács, L., Rath, J., Rawson, M., Reger, G., Suda, M., Schoisswohl, J., Voronkov, A.: The Vampire diary. In: Piskac, R., Rakamaric, Z. (eds.) CAV 2025. LNCS, vol. 15933, pp. 57–71. Springer (2025) Tao’s Equational Proof Challenge Accepted (Technical Report) 19
work page 2025
-
[5]
Blanchette, J.C., Böhme, S., Fleury, M., Smolka, S.J., Steckermeier, A.: Semi-intelligible Isar proofs from machine-generated proofs. J. Automated Reas. 56, 155–200 (2016)
work page 2016
-
[6]
Bolan, M., Breitner, J., Brox, J., Carlini, N., Carneiro, M., van Doorn, F., Dvorak, M., Goens, A., Hill, A., Husum, H., Mejia, H.I., Kocsis, Z.A., Floch, B.L., Bar-on, A.L., Luccioli, L., McNeil, D., Meiburg, A., Monticone, P., Nielsen, P., Osazuwa, E.O., Paolini, G., Petracci, M., Reinke, B., Renshaw, D., Rossel, M., Roux, C., Scanvic, J., Srinivas, S.,...
work page 2025
-
[7]
Buch, A., Hillenbrand, T.: WALDMEISTER: Development of a high performance completion-based theorem prover. Tech. rep. (1996)
work page 1996
-
[8]
In: Bertot, Y., Kutsia, T., Norrish, M
Clune, J., Qian, Y., Bentkamp, A., Avigad, J.: Duper: A proof-producing superposition theorem prover for dependent type theory. In: Bertot, Y., Kutsia, T., Norrish, M. (eds.) ITP 2024. LIPIcs, vol. 309, pp. 1–20. Leibniz-Zentrum für Informatik (2024)
work page 2024
-
[9]
Cotton, S.: Two techniques for minimizing resolution proofs. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS, vol. 6175, pp. 306–312. Springer (2010)
work page 2010
-
[10]
Denzinger, J., Schulz, S.: Recording, analyzing and presenting distributed deduction processes. In: Hong, H. (ed.) PASCO 1994. Lecture Notes Series in Computing, vol. 5, pp. 114–123. World Scientific (1994)
work page 1994
-
[11]
Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)
Floch, B.L.: Zulip post in “Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)”. Zulip (2025), https://leanprover.zulipchat.com/#narrow/channel/219941-Machine- Learning-for-Theorem-Proving/topic/A.20.28semi.29- autoformalization.20challenge.3A.20650.3D.3E448/near/518970125
-
[12]
Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)
Floch, B.L.: Zulip post in “Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)”. Zulip (2025), https://leanprover.zulipchat.com/#narrow/channel/219941-Machine- Learning-for-Theorem-Proving/topic/A.20.28semi.29- autoformalization.20challenge.3A.20650.3D.3E448/near/568313777
-
[13]
In: Benzmüller, C., Heule, M., Schmidt, R
Geoff Sutcliffe: Stepping stones in the TPTP World. In: Benzmüller, C., Heule, M., Schmidt, R. (eds.) IJCAR 2024. pp. 30–50. LNCS (2024)
work page 2024
- [14]
-
[15]
In: Barrett, C.W., Waldmann, U
Hajdu, M., Coutelier, R., Kovács, L., Voronkov, A.: Term ordering diagrams. In: Barrett, C.W., Waldmann, U. (eds.) CADE-30. LNCS, Springer, to appear
- [16]
-
[17]
In: Gramlich, B., Miller, D., Sattler, U
Järvisalo, M., Heule, M., Biere, A.: Inprocessing rules. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS, vol. 7364, pp. 355–370. Springer (2012)
work page 2012
-
[18]
Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)
Kinyon, M.: Zulip post in “Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)”. Zulip (2025), https://leanprover.zulipchat.com/#narrow/channel/219941-Machine- Learning-for-Theorem-Proving/topic/A.20.28semi.29- autoformalization.20challenge.3A.20650.3D.3E448/near/518961204
-
[19]
Knuth, D.E., Bendix, P.B.: Simple word problems in universal algebras. In: Leech, J. (ed.) Computational Problems in Abstract Algebra, pp. 263–297. Pergamon Press (1970) 20 L. Kondylidou et al
work page 1970
-
[20]
Zenodo (2026),https://doi.org/10.5281/zenodo.18624123
Kondylidou, L., Blanchette, J., Heule, M.: Tao’s equational proof challenge accepted. Zenodo (2026),https://doi.org/10.5281/zenodo.18624123
-
[21]
Limperg, J., From, A.H.: Aesop: White-box best-first proof search for Lean. In: CPP 2023. pp. 253–266. ACM (2023)
work page 2023
-
[22]
McCune, W.: Prover9 and Mace4 (2005–2010)
work page 2005
-
[23]
In: Biere, A., Lutz, C., Negri, S
Morrison, K., de Moura, L.:grind: An SMT-inspired tactic for Lean 4 (short paper—system description). In: Biere, A., Lutz, C., Negri, S. (eds.) IJCAR 2026. LNCS, Springer, to appear
work page 2026
-
[24]
In: Bertot, Y., Kutsia, T., Norrish, M
de Moura, L., Ullrich, S.: The Lean 4 theorem prover and programming language. In: Bertot, Y., Kutsia, T., Norrish, M. (eds.) CADE-28. LNCS, vol. 12699, pp. 625–635. Springer (2021)
work page 2021
-
[25]
Norman, C., Avigad, J.: Canonical for automated theorem proving in Lean. In: ITP 2025. LIPIcs, vol. 352, pp. 1–20. Leibniz-Zentrum für Informatik (2025)
work page 2025
-
[26]
Smallbone, N.: Twee: An equational theorem prover. In: Platzer, A., Sutcliffe, G. (eds.) CADE-28. LNCS, vol. 12699, pp. 602–613. Springer (2021)
work page 2021
-
[27]
Stachniak, Z.: Minimization of resolution proof systems. Fundam. Informaticae 14(1), 129–146 (1991)
work page 1991
-
[28]
AI Communications38, 3–20 (2025)
Sutcliffe, G.: The 12th IJCAR Automated Theorem Proving System Competition—CASC-J12. AI Communications38, 3–20 (2025)
work page 2025
-
[29]
Sutcliffe, G., Chang, C., McGuinness, D., Lebo, T., Ding, L., da Silva, P.P.: Combining proofs to form different proofs. In: Fontaine, P., Stump, A. (eds.) PxTP 2011. pp. 60–73. LNCS (2011)
work page 2011
-
[30]
Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)
Tao, T.: Zulip thread “Machine Learning for Theorem Proving: A (Semi)-Autoformalization Challenge (650→448)”. Zulip (2025), https://leanprover-community.github.io/archive/stream/219941-Machine- Learning-for-Theorem-Proving/
work page 2025
-
[31]
Vyskočil, J., Stanovský, D., Urban, J.: Automated proof compression by invention of new definitions. In: Clarke, E.M., Voronkov, A. (eds.) LPAR-16. LNCS, vol. 6355, pp. 447–462. Springer (2010)
work page 2010
- [32]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.