Recognition: 2 theorem links
· Lean TheoremExplorable Theorems: Making Written Theorems Explorable by Grounding Them in Formal Representations
Pith reviewed 2026-05-13 19:25 UTC · model grok-4.3
The pith
Grounding written proofs in executable Lean code lets readers step through them and test examples interactively.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Explorable theorems are created by translating a written theorem and proof into Lean via LLMs, then linking each natural-language step to its formal counterpart; readers can advance step by step, supply their own examples, and inspect the exact intermediate states produced by executing the Lean code, with the result that participants in a proof-comprehension study gave more accurate and detailed answers.
What carries the argument
Explorable theorems, the linked pair of written proof and its Lean formalization that supports step-level execution and example testing.
If this is right
- Readers can actively test whether a claimed theorem holds for chosen inputs rather than accepting it on trust.
- Proof dependencies become visible because each step can be examined in isolation with its exact formal state.
- The same written proof can support multiple different examples without rewriting the text.
- Comprehension questions about proofs receive more detailed answers when users have access to the executable version.
Where Pith is reading between the lines
- The approach could be applied to other formal languages or proof assistants beyond Lean if translation quality is sufficient.
- Integration with online math platforms might allow collaborative exploration of theorems by multiple readers at once.
- For very long or complex proofs the cost of accurate LLM translation may limit practical use unless automated verification catches errors.
- The technique might generalize to non-mathematical technical explanations that can be grounded in executable specifications.
Load-bearing premise
LLM translations from natural-language proofs to correct Lean code are reliable enough to support exploration without introducing misleading states.
What would settle it
A follow-up study in which the Lean translations contain errors and participants using the system perform no better, or worse, on comprehension questions than readers given only the original text.
Figures
read the original abstract
LLM-generated explanations can make technical content more accessible, but there is a ceiling on what they can support interactively. Because LLM outputs are static text, they cannot be executed or stepped through. We argue that grounding explanations in a formalized representation enables interactive affordances beyond what static text supports. We instantiate this idea for mathematical proof comprehension with explorable theorems, a system that uses LLMs to translate a theorem and its written proof into Lean, a programming language for machine-checked proofs, and links the written proof with the Lean code. Readers can work through the proof at a step-level granularity, test custom examples or counterexamples, and trace the logical dependencies bridging each step. Each worked-out step is produced by executing the Lean proof on that example and extracting its intermediate state. A user study ($n = 16$) shows potential advantages of this approach: in a proof-reading task, participants who had access to the provided explorability features gave better, more correct, and more detailed answers to comprehension questions, demonstrating a stronger overall understanding of the underlying mathematics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes explorable theorems, a system that uses LLMs to translate written mathematical theorems and proofs into Lean code and links the natural-language text to the formal representation. This enables interactive affordances including step-level proof execution, testing of custom examples or counterexamples, and tracing of logical dependencies. A user study (n=16) reports that participants with access to these features produced better, more correct, and more detailed answers to comprehension questions than those without.
Significance. If the central claims hold after addressing methodological gaps, the work offers a concrete bridge between static written mathematics and executable formal artifacts. It demonstrates a practical way to augment LLM-generated explanations with verifiable interactivity, which could improve proof comprehension in education and research settings while preserving the original written form.
major comments (2)
- [User Study] The user study (n=16) is presented as evidence of comprehension gains, yet the abstract and methods description supply no information on randomization, pre-registration, time-on-task controls, or statistical reporting (e.g., error bars or effect sizes). This leaves open the possibility that observed differences arise from study-design artifacts rather than the explorability features themselves.
- [System Architecture] The system relies on LLM-generated Lean translations to support step execution and counterexample testing, but the manuscript provides no manual verification or accuracy metrics for these translations. Without such checks, mismatches between the written proof and the formal artifact would invalidate the claimed interactive affordances.
minor comments (1)
- [Abstract] The abstract would benefit from naming the specific theorems or proof domains used in the evaluation to allow readers to assess generalizability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important gaps in methodological reporting and system verification that we will address in revision. Below we respond point by point.
read point-by-point responses
-
Referee: [User Study] The user study (n=16) is presented as evidence of comprehension gains, yet the abstract and methods description supply no information on randomization, pre-registration, time-on-task controls, or statistical reporting (e.g., error bars or effect sizes). This leaves open the possibility that observed differences arise from study-design artifacts rather than the explorability features themselves.
Authors: We agree that the current reporting is insufficient. The 16 participants were randomly assigned to conditions. The study was not pre-registered because it was exploratory. We recorded but did not previously report time-on-task; average completion times will now be included. In the revision we will expand the Methods section with these details, add error bars to all figures, report effect sizes (Cohen's d), and explicitly discuss the small-sample limitation. These changes will allow readers to better evaluate whether the observed gains in answer quality, correctness, and detail are attributable to the explorability features. revision: yes
-
Referee: [System Architecture] The system relies on LLM-generated Lean translations to support step execution and counterexample testing, but the manuscript provides no manual verification or accuracy metrics for these translations. Without such checks, mismatches between the written proof and the formal artifact would invalidate the claimed interactive affordances.
Authors: We accept this criticism. While the paper does not describe it, the Lean translations for all theorems and proofs used in the evaluation were manually inspected by the authors to confirm step-level correspondence and working interactivity. No quantitative accuracy metrics across a broader corpus were collected. In the revised manuscript we will add a dedicated verification subsection that reports the manual checks performed, notes any discrepancies encountered, and provides basic accuracy observations for the presented examples. We will also discuss how the linking mechanism allows readers to detect and correct mismatches themselves. revision: yes
Circularity Check
No significant circularity; claims rest on system description and empirical user study
full rationale
The paper presents a system for translating written theorems and proofs into Lean via LLMs, linking them for interactive step-level execution, counterexample testing, and dependency tracing, then evaluates the approach via a between-subjects user study (n=16) measuring comprehension gains. No load-bearing derivation chain exists that reduces by construction to its own inputs, fitted parameters, or self-citations. The central claim of improved understanding is supported directly by the reported study outcomes rather than any self-definitional equivalence, renamed empirical pattern, or uniqueness theorem imported from prior author work. The paper is self-contained against external benchmarks such as the described interactive affordances and measured participant performance.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can translate natural language mathematical proofs into correct Lean code at a level sufficient for interactive use
- domain assumption Interactive execution and example testing improve human comprehension of proofs compared to static text
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
uses LLMs to translate a theorem and its written proof into Lean... Readers can work through the proof at a step-level granularity, test custom examples or counterexamples, and trace the logical dependencies
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction and embed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dependency recovery... diff of consecutive proof states... directed acyclic graph whose nodes are the facts introduced
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Lara Alcock, Mark Hodds, Somali Roy, and Matthew Inglis. 2015. Investigating and improving undergraduate proof comprehension.Notices of the AMS62, 7 (2015), 742–752
work page 2015
-
[2]
Lara Alcock and Nicola Wilkinson. 2011. e-Proofs: Design of a resource to support proof comprehension in mathematics.Educational Designer1, 4 (2011), 1–20
work page 2011
-
[3]
Paul Ayres. 2001. Systematic mathematical errors and cognitive load.Contempo- rary educational psychology26, 2 (2001), 227–248
work page 2001
-
[4]
Zhangir Azerbayev, Bartosz Piotrowski, and Jeremy Avigad. 2022. ProofNet: A benchmark for autoformalizing and formally proving undergraduate-level mathematics problems. InSecond MATH-AI Workshop
work page 2022
-
[5]
Elizabeth L Bjork, Robert A Bjork, et al. 2011. Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning.Psychology and the real world: Essays illustrating fundamental contributions to society2, 59-68 (2011), 56–64
work page 2011
-
[6]
2016.Qualitative HCI research: Going behind the scenes
Ann Blandford, Dominic Furniss, and Stephann Makri. 2016.Qualitative HCI research: Going behind the scenes. Morgan & Claypool Publishers
work page 2016
-
[7]
Neil Chulpongsatorn, Mille Skovhus Lunding, Nishan Soni, and Ryo Suzuki. 2023. Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math Textbooks. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). ACM, 1–16
work page 2023
-
[8]
Will Crichton. 2021. A New Medium for Communicating Research on Program- ming Languages. InProceedings of the 1st Workshop on Human Aspects of Types and Reasoning Assistants
work page 2021
-
[9]
Ben Davies, Lara Alcock, and Ian Jones. 2020. Comparative judgement, proof summaries and proof comprehension.Educational Studies in Mathematics105, 2 (2020), 181–197
work page 2020
-
[10]
Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2015. The Lean Theorem Prover (System Description). InProceedings of the 25th International Conference on Automated Deduction (CADE-25) (Lecture Notes in Computer Science, Vol. 9195). Springer, 378–388. https://doi.org/10.1007/ 978-3-319-21401-6_26
work page 2015
-
[11]
Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2015. The Lean theorem prover (system description). InInternational Conference on Automated Deduction (CADE). Springer, 378–388
work page 2015
-
[12]
Anton N. Dragunov and Jonathan L. Herlocker. 2003. Designing Intelligent and Dynamic Interfaces for Communicating Mathematics. InProceedings of the 8th International Conference on Intelligent User Interfaces. 154–161. https://doi.org/ 10.1145/604045.604085
-
[13]
Eddie M Gray and David O Tall. 1994. Duality, ambiguity, and flexibility: A “proceptual” view of simple arithmetic.Journal for research in Mathematics Education25, 2 (1994), 116–140
work page 1994
-
[14]
Ziwei Gu, Joyce Zhou, Ning-Er Lei, Jonathan K Kummerfeld, Mahmood Jasim, Narges Mahyar, and Elena L Glassman. 2025. AbstractExplorer: Leveraging Structure-Mapping Theory to Enhance Comparative Close Reading at Scale. In Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–25
work page 2025
-
[15]
Aditya Gunturu, Yi Wen, Nandi Zhang, Jarin Thundathil, Rubaiat Habib Kazi, and Ryo Suzuki. 2024. Augmented Physics: Creating Interactive and Embedded Physics Simulations from Static Textbook Diagrams. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST ’24). ACM, 1–12
work page 2024
-
[16]
Guershon Harel and Larry Sowder. 2007. Toward comprehensive perspectives on the learning and teaching of proof.Second handbook of research on mathematics teaching and learning2 (2007), 805–842
work page 2007
-
[17]
John Harrison, Josef Urban, and Freek Wiedijk. 2014. History of interactive theorem proving.Computational Logic9 (2014), 135–214
work page 2014
-
[18]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. InAdvances in psy- chology. Vol. 52. Elsevier, 139–183
work page 1988
-
[19]
Andrew Head, Amber Xie, and Marti A. Hearst. 2022. Math Augmentation: How Authors Enhance the Readability of Formulas using Novel Visual Design Practices. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501932
-
[20]
Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. TrueSkill: A Bayesian Skill Rating System. InAdvances in Neural Information Processing Systems, Vol. 19
work page 2006
-
[21]
Thomas Hubert, Roshan Mehta, Laurent Sartran, et al . 2025. Olympiad-level formal mathematical reasoning with reinforcement learning.Nature(2025). https://doi.org/10.1038/s41586-025-09833-y
-
[22]
Albert Qiaochu Jiang, Sean Welleck, Jue Ping Zhou, Timothée Lacroix, Jiacheng Liu, Guillaume Lample, Samuel Lewis, Yuhuai Wu, Xiaokai Hou, Christian Hutter, et al. 2023. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. InInternational Conference on Learning Representations (ICLR)
work page 2023
-
[23]
Dongwei Jiang, Marcio Fonseca, and Shay B Cohen. 2024. Leanreasoner: Boosting complex logical reasoning with lean. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 7497–7510
work page 2024
-
[24]
Philip N Johnson-Laird. 2005. Mental models and thought.The Cambridge handbook of thinking and reasoning(2005), 185–208
work page 2005
-
[25]
Hee-jeong Kim. 2020. Concreteness fading strategy: A promising and sustainable instructional model in mathematics classrooms.Sustainability12, 6 (2020), 2211
work page 2020
-
[26]
Zeger-jan Kock, Ulises Salinas-Hernández, and Birgit Pepin. 2025. Engineering students’ initial use schemes of ChatGPT as an instrument for learning.Digital Experiences in Mathematics Education11, 1 (2025), 192–218
work page 2025
-
[27]
Christine P Lee, David Porfirio, Xinyu Jessica Wang, Kevin Chenkai Zhao, and Bilge Mutlu. 2025. Veriplan: Integrating formal verification and llms into end- user planning. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19
work page 2025
-
[28]
Stanley Letovsky. 1987. Cognitive processes in program comprehension.Journal of Systems and software7, 4 (1987), 325–339
work page 1987
-
[29]
Yong Li, Shoaib Kamil, Alec Jacobson, and Yotam Gingold. 2022. Hé: Docu- ment Processor for Executable Linear Algebra Papers. InSIGGRAPH Asia 2022 Conference Papers. https://doi.org/10.1145/3550469.3555395
-
[30]
Raymond Lister, Elizabeth S Adams, Sue Fitzgerald, William Fone, John Hamer, Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto Seppälä, et al. 2004. A multi-national study of reading and tracing skills in novice programmers.Acm sigcse bulletin36, 4 (2004), 119–150
work page 2004
- [31]
-
[32]
Patrick Massot. 2024. Teaching mathematics using lean and controlled natural language. In15th International Conference on Interactive Theorem Proving (ITP 2024). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 27–1
work page 2024
-
[33]
Juan Pablo Mejía-Ramos, Evan Fuller, Keith Weber, Kathryn Rhoads, and Aron Samkoff. 2012. An assessment model for proof comprehension in undergraduate mathematics.Educational Studies in Mathematics79, 1 (2012), 3–18
work page 2012
-
[34]
Mejía-Ramos, Erika Fuller, Keith Weber, Kari Rhoads, and Alexis Samkoff
Juan P. Mejía-Ramos, Erika Fuller, Keith Weber, Kari Rhoads, and Alexis Samkoff
-
[35]
https://doi.org/ 10.1007/s10649-011-9349-7
An Assessment Model for Proof Comprehension in Undergraduate Mathe- matics.Educational Studies in Mathematics79, 1 (2012), 3–18. https://doi.org/ 10.1007/s10649-011-9349-7
-
[36]
Hugo Mercier and Dan Sperber. 2011. Why do humans reason? Arguments for an argumentative theory.Behavioral and brain sciences34, 2 (2011), 57–74
work page 2011
-
[37]
Frédéric Tran Minh, Laure Gonnord, and Julien Narboux. 2025. Proof Assis- tants for Teaching: a Survey. InThEdu@CADE. https://api.semanticscholar.org/ CorpusID:278422822
work page 2025
-
[38]
Nagham M Mohammad, Matthew Demers, Erin McCubbin, Mitchell Jackson, and Sara M Fulmer. 2025. How college students use ChatGPT.Pedagogical Research 10, 4 (2025), em0250
work page 2025
-
[39]
Robert C Moore. 1994. Making the transition to formal proof.Educational Studies in Mathematics27, 3 (1994), 249–266
work page 1994
-
[40]
Leonardo de Moura and Sebastian Ullrich. 2021. The Lean 4 theorem prover and programming language. InAutomated Deduction–CADE 28: 28th International Conference on Automated Deduction. Springer, 625–635
work page 2021
-
[41]
Chris Olah and Shan Carter. 2017. Research Debt.Distill(2017). https://doi.org/ 10.23915/distill.00005
-
[42]
2024.lean-lsp-mcp: Lean Theorem Prover MCP
oOo0oOo. 2024.lean-lsp-mcp: Lean Theorem Prover MCP. https://github.com/ oOo0oOo/lean-lsp-mcp MCP server that allows agentic interaction with the Lean theorem prover via the Language Server Protocol. GitHub repository
work page 2024
-
[43]
Dale Parsons and Patricia Haden. 2006. Parson’s programming puzzles: a fun and effective learning tool for first programming courses. InProceedings of the 8th Australasian Conference on Computing Education-Volume 52. 157–163
work page 2006
-
[44]
Birgit Pepin, Nils Buchholtz, and Ulises Salinas-Hernández. 2025. A scoping survey of ChatGPT in mathematics education.Digital Experiences in Mathematics Education11, 1 (2025), 9–41
work page 2025
-
[45]
Clément Pit-Claudel. 2020. Untangling mechanized proofs. InProceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering. Kambhamettu et al
work page 2020
-
[46]
Nicholas Rougeux and Oliver Byrne. 2018. Byrne’s Euclid: The First Six Books of the Elements of Euclid with Coloured Diagrams and Symbols. https:// www.c82.net/euclid/ Online interactive reproduction
work page 2018
-
[47]
Alan H Schoenfeld. 1988. When good teaching leads to bad results: The disasters of’well-taught’mathematics courses.Educational psychologist23, 2 (1988), 145– 166
work page 1988
-
[48]
Annie Selden and John Selden. 2003. Validations of proofs considered as texts: Can undergraduates tell whether an argument proves a theorem?Journal for Research in Mathematics Education34, 1 (2003), 4–36
work page 2003
-
[49]
Derar Serhan and Natalie B. Welcome. 2024. Integrating ChatGPT in the Calculus Classroom: Student Perceptions.International Journal of Technology in Education and Science(2024). https://api.semanticscholar.org/CorpusId:269835173
work page 2024
-
[50]
Despina A Stylianou, Maria L Blanton, Eric J Knuth, and Despi Stylianou. 2009. Teaching and learning proof across the grades. Routledge New York
work page 2009
-
[51]
Despina A Stylianou, Maria L Blanton, and Ourania Rotou. 2015. Undergraduate students’ understanding of proof: Relationships between proof conceptions, beliefs, and classroom experiences with learning proof.International Journal of Research in Undergraduate Mathematics Education1, 1 (2015), 91–134
work page 2015
-
[52]
Nicole Sultanum and Arjun Srinivasan. 2023. Datatales: Investigating the use of large language models for authoring data-driven articles. In2023 IEEE Visualiza- tion and Visual Analytics (VIS). IEEE, 231–235
work page 2023
-
[53]
John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science12, 2 (1988), 257–285
work page 1988
-
[54]
John Sweller, Paul Ayres, and Slava Kalyuga. 2011.Cognitive load theory. Springer Science & Business Media
work page 2011
-
[55]
Jeffrey Tao, Litao Yan, Jessica Shi, Mia Ginsberg, and Andrew Head. 2025. FreeForm: Flexibly Augmenting Formulas with Synchronized Markup and Graph- ical Edits. InProceedings of the 2025 CHI Conference on Human Factors in Com- puting Systems. 1–12
work page 2025
-
[56]
Oliver Tatton-Brown. 2019. Rigour and Intuition.Erkenntnis86 (2019), 1757 –
work page 2019
-
[57]
https://api.semanticscholar.org/CorpusID:254465500
-
[58]
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[59]
2024.Widgets: The user-widgets system
The Lean Developers. 2024.Widgets: The user-widgets system. https://lean- lang.org/examples/1900-1-1-widgets/ Lean 4 Documentation
work page 2024
-
[60]
Bret Victor. 2011. Explorable Explanations. https://worrydream.com/ ExplorableExplanations/. Accessed 2026-03-01
work page 2011
-
[61]
Bret Victor. 2011. Kill Math. http://worrydream.com/KillMath/. Accessed: 2024-05-24
work page 2011
-
[62]
Bret Victor. 2011. Tangle: a JavaScript Library for Reactive Documents. https:// worrydream.com/Tangle/. Accessed 2026-03-24
work page 2011
-
[63]
Tashtoush, Rommel Alali, and Adeeb M Jarrah
Yousef Wardat, Mohammad A. Tashtoush, Rommel Alali, and Adeeb M Jarrah
-
[64]
Eurasia Journal of Mathematics, Science and Technology Education(2023)
ChatGPT: A revolutionary tool for teaching and learning mathematics. Eurasia Journal of Mathematics, Science and Technology Education(2023). https:// pdfs.semanticscholar.org/3174/39900cffedb7f18e623c1b0ab3fa8e54de21.pdf
work page 2023
-
[65]
Keith Weber. 2008. How mathematicians determine if an argument is a valid proof.Journal for Research in Mathematics Education39, 4 (2008), 431–459
work page 2008
-
[66]
Keith Weber. 2015. Effective Proof Reading Strategies for Comprehending Math- ematical Proofs.International Journal of Research in Undergraduate Mathematics Education1, 3 (2015), 289–314. https://doi.org/10.1007/s40753-015-0011-0
-
[67]
Keith Weber and Juan Pablo Mejia-Ramos. 2011. Why and how mathematicians read proofs: an exploratory study.Educational Studies in Mathematics76, 3 (2011), 329–344. https://doi.org/10.1007/s10649-010-9292-z
-
[68]
Erik Weitnauer, David Landy, and Erin Ottmar. 2016. Graspable Math: Towards Dynamic Algebra Notations that Support Learners Better than Paper. In2016 Future Technologies Conference. 406–414
work page 2016
-
[69]
Jelle Wemmenhove, Dick Arends, Thijs Beurskens, Maitreyee Bhaid, Sean Mc- Carren, Jan Moraal, Diego Rivera Garrido, David Tuin, Malcolm Vassallo, Pieter Wils, et al. 2022. Waterproof: educational software for learning how to write mathematical proofs.arXiv preprint arXiv:2211.13513(2022)
-
[70]
Carl E. Wieman, Wendy K. Adams, and Katherine K. Perkins. 2008. PhET: Simulations that enhance learning.Science322, 5902 (2008), 682–683
work page 2008
-
[71]
Michelle Wu, Aadil Gopal, and Andrew Head. 2023. FFL: A Language and Live Runtime for Styling and Labeling Typeset Math Formulas. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
work page 2023
-
[72]
Hyunkyoung Yoon, Jihye Hwang, Kyungwon Lee, Kyeong Hah Roh, and Oh Nam Kwon. 2024. Students’ use of generative artificial intelligence for proving mathe- matical statements.ZDM–Mathematics Education56, 7 (2024), 1531–1551
work page 2024
-
[73]
Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. 2022. MiniF2F: a cross- system benchmark for formal Olympiad-level mathematics. InInternational Conference on Learning Representations (ICLR). A Appendix A.1 Narrative Scenario We demonstrate how the interaction patterns in explorable theo- rems can support productive inference over a longer period of u...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.