FISolver trains a compact LLM on backward-generated (differential equation, first integral) pairs and uses guided reinforcement learning to outperform larger models and Mathematica on first-integral benchmarks at lower cost.
Charton,Linear algebra with transformers, TMLR October 2022, (2022)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
Transformer trained on S10 permutation prediction from transpositions generalizes to S25 with near 100% accuracy using identity augmentation and partitioned windows.
citing papers explorer
-
Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning
FISolver trains a compact LLM on backward-generated (differential equation, first integral) pairs and uses guided reinforcement learning to outperform larger models and Mathematica on first-integral benchmarks at lower cost.
-
Reconstructing conformal field theoretical compositions with Transformers
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
-
Arithmetic Pedagogy for Language Models
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
-
Learning the symmetric group: large from small
Transformer trained on S10 permutation prediction from transpositions generalizes to S25 with near 100% accuracy using identity augmentation and partitioned windows.
- The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior