pith. sign in

arxiv: 2606.07915 · v1 · pith:7WUFBICCnew · submitted 2026-06-06 · 💻 cs.AI

EditSR: Enhancing Neural Symbolic Regression via Edit-based Rectification

Pith reviewed 2026-06-27 20:13 UTC · model grok-4.3

classification 💻 cs.AI
keywords neural symbolic regressionedit-based rectificationerror accumulationsymbolic regressionautoregressive decodingstate transition
0
0 comments X

The pith

A two-layer setup with a neural generator plus a pretrained edit rectifier recovers correct symbolic expressions more reliably than one-pass decoding alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural symbolic regression models generate expressions quickly by pretraining but accumulate errors during autoregressive decoding, especially on complex cases. EditSR adds a second layer that treats rectification as a chain of state transitions starting from an incorrect expression. A state-transition algorithm builds supervised training data so the rectifier learns to apply syntactically valid edits, each conditioned only on the current state. This design lets later edits correct earlier mistakes without depending on full history or restarting global search. Experiments show higher rates of correct structure recovery at modest added cost, with the largest gains on harder expressions.

Core claim

The paper claims that pretraining an edit-based Rectifier on supervised state-transition chains allows post-hoc correction of structurally invalid expressions produced by a first-layer neural symbolic regression model, while each edit stays inside a syntactically valid space and conditioning on the current state alone reduces error accumulation across the chain.

What carries the argument

The edit-based Rectifier, which learns to perform step-by-step state transitions that correct an expression using only the current state as input.

If this is right

  • Symbolic structure recovery rates rise compared with the base neural model alone.
  • Gains are larger for complex expressions where single-pass decoding tends to fail.
  • The added computation stays limited because the rectifier is pretrained and runs without restarting global search.
  • Every intermediate expression stays syntactically valid because edits are restricted to a valid action space.
  • Conditioning each edit only on the current state allows later steps to override earlier mistakes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same state-transition rectification pattern could be applied to other autoregressive generation tasks that produce structured outputs such as mathematical proofs or code.
  • Separating generation from correction might let smaller base models suffice if the rectifier handles most structural fixes.
  • Extending the rectifier to handle expressions with more operators or variables would test whether the state-only conditioning continues to prevent accumulation.

Load-bearing premise

The supervised rectification chains built by the state-transition algorithm will train a rectifier that generalizes to correct the kinds of errors the base neural model actually makes on unseen data.

What would settle it

Running the trained rectifier on a held-out test set of expressions generated by the first-layer model and finding no measurable increase in the fraction of expressions that match the ground-truth symbolic structure.

Figures

Figures reproduced from arXiv: 2606.07915 by Da Li, Jin Xu, Juan Zhang, Junping Yin, Xingyu Cui, Xinxin Li.

Figure 1
Figure 1. Figure 1: Under autoregressive decoding, two possible outcomes after gener￾ating an incorrect token. If the target expression is (x1 + x2)·sin(x3), then once the unary operator sin is incorrectly generated as the binary operator + (high￾lighted in red), the model either leaves the tree structurally incorrect (dashed box on the left) or is forced to close the wrong branch with an irrelevant subtree (dashed box on the… view at source ↗
Figure 2
Figure 2. Figure 2: Training overview of EditSR. At the top, a neural symbolic regression model in the first layer is trained to map datasets directly to expressions, using the target expression f ∗ for supervision. At the bottom, the Rectifier learns a rectification chain from f (0) to f ∗ . Here, z (t) denotes the edit action at step t, where KP, RP, and RW are abbreviations for the edit actions Keep, Replace, and Rewrite, … view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the four non-trivial edit actions used by the Rectifier. From left to right, the columns show examples of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: reports the Accuracy solution rate and Symbolic so￾lution rate on the standard benchmarks. EditSR is one of the most stable models across these benchmarks, as its Symbolic solution rate usually stays close to its Accuracy solution rate on the same benchmark. This phenomenon indicates that EditSR primarily improves structural recovery, as expected, since the Rectifier always edits an incorrect expression to… view at source ↗
Figure 5
Figure 5. Figure 5: ECDF of mean R 2 on standard benchmarks. The results show the mean over 10 runs under Gaussian noise levels of 0, 0.001, 0.01, and 0.1, respectively. target expression. As the number of distractors increases, the task overall becomes harder. TPSR and uDSR achieve strong Accuracy solution rates, especially for k = 2 and k = 3, but they also rely more heavily on distractors, which is accompanied by a clearer… view at source ↗
Figure 6
Figure 6. Figure 6: Results on Feynman and ODE-Strogatz benchmarks. Models are ordered from top to bottom by their Accuracy solution rate and Symbolic solution rate. Each point summarizes the mean result of 10 runs, and the error bars denote 95% bootstrap confidence intervals. rate falls more sharply than their Accuracy solution rate as noise increases, or their confidence intervals are visibly wider. By contrast, EditSR rema… view at source ↗
Figure 7
Figure 7. Figure 7: Mean R 2 and Complexity results on Black-box and Phenomenological & first-principles benchmarks. Each point corresponds to the mean over 10 runs for each problem. In each rain-cloud plot, the cloud shows the distribution density, the embedded box summarizes the median and interquartile range, and the points show the individual problem results. high-dimensional scenarios. Each experiment is repeated 5 times… view at source ↗
Figure 8
Figure 8. Figure 8: Ablation results of the Rectifier on the Feynman benchmark. Results are averaged over 5 runs, with standard deviation shown when applicable. and Symbolic solution rate. Overall, the gap between NeSym￾ReS and EditSR becomes progressively larger as complexity increases, which suggests that the Rectifier is particularly useful for generating long expressions. This phenomenon is consistent with the intuition t… view at source ↗
Figure 9
Figure 9. Figure 9: Effect of the Rectifier across complexity levels. Error bars denote standard deviation over 5 runs. Problems are grouped into 5 buckets according to target expression complexity. normalized edit distance is defined as EDnorm f (t) , f ∗  = ED f (t) , f ∗  max [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of effective edit steps used by the Rectifier. The results are first averaged across the beams, and then across 5 runs [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Action frequency and effectiveness of the Rectifier. The results treat all repeated runs as independent instances, rather than averaging them. bations. For each setting, we report the Complexity, Symbolic solution rate and Accuracy solution rate results. As shown in [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Validation accuracy curves of the Rectifier during fine-tuning. The Rectifier is fine-tuned for 5 epochs and evaluated on the validation set every 0.5 epoch. rectifications. Since each decision is conditioned on the cur￾rent state rather than the whole edit history, the Rectifier is not forced to preserve a myopic locally best trajectory at every step. Instead, it can use an intermediate edit to expose a … view at source ↗
Figure 14
Figure 14. Figure 14: Representative successful and failed rectification cases of EditSR. Each subplot shows the normalized edit distance between f (t) and f ∗ across rectification steps, together with the selected edit action and the corresponding intermediate state. The first four cases show the dominant regime in which the first-layer prediction already provides a plausible structural scaffold, and the Rectifier recovers mi… view at source ↗
Figure 15
Figure 15. Figure 15: R 2 versus test time for EditSR and TPSR. The results represent the mean over 5 runs. Overall, EditSR occupies one of the strongest trade-off regions, achieving comparable or better performance on most problems while requiring substantially less time. 8. Conclusion In this paper, we introduce EditSR to alleviate error accumula￾tion in neural symbolic regression models. Instead of treating an incorrect pre… view at source ↗
read the original abstract

Neural symbolic regression models improve inference efficiency by shifting structural search to pretraining, but their one-pass autoregressive decoding is prone to error accumulation, which may lead to generating structurally incorrect expressions, especially in complex expression generation scenarios. Existing rectification strategies can alleviate this issue, but they often depend on restarting global search, thereby weakening the efficiency advantage of neural models, and remain susceptible to error accumulation. In this paper, we propose EditSR, a two-layer framework that combines a neural symbolic regression model in the first layer with an edit-based Rectifier in the second layer to achieve efficient prediction and post-hoc rectification. Instead of restarting the global search, we maintain rectification efficiency by pretraining the Rectifier. Specifically, we formulate the rectification process as a step-by-step state-transition chain starting from an incorrect expression, and develop a state-transition algorithm to construct supervised rectification chains for training the Rectifier. To ensure syntactic validity throughout rectification, each edit action is restricted to a syntactically valid space so that every edited expression remains parseable. In addition, because each edit decision is conditioned on the current state rather than the history, the Rectifier allows errors made in earlier steps to be rectified by subsequent edits, thereby reducing the risk of error accumulation. Extensive experiments and ablation studies show that EditSR substantially improves symbolic structure recovery with limited extra cost, with more pronounced gains on complex expressions, where one-pass autoregressive decoding is more susceptible to error accumulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes EditSR, a two-layer neural symbolic regression framework. A first-layer neural model performs one-pass autoregressive decoding to generate candidate expressions; a second-layer Rectifier, pretrained on supervised edit sequences, then performs step-by-step syntactic edits. Rectification chains are generated by a state-transition algorithm that starts from incorrect expressions and applies valid edits; each decision is conditioned only on the current syntactic state (not history) and restricted to the syntactically valid action space. The authors claim this yields substantially better symbolic structure recovery than the base neural model, at modest extra cost, with larger gains on complex expressions where error accumulation is pronounced.

Significance. If the rectifier demonstrably generalizes from the constructed training chains to the actual error distribution of the first-layer model on held-out data, the approach would provide a practical, efficiency-preserving way to mitigate autoregressive error accumulation in neural SR without reverting to global search restarts. The syntactic-validity constraint and history-free conditioning are technically attractive features that could be adopted more broadly.

major comments (2)
  1. [Methods section describing the state-transition algorithm and Rectifier training] The central claim that the pretrained Rectifier improves structure recovery on unseen expressions rests on the unverified assumption that the error distribution in the state-transition training chains matches the distribution of mistakes made by the first-layer neural model. The manuscript provides no comparison (e.g., statistics on error types, edit distances, or syntactic patterns) between the initial incorrect expressions used to build the chains and the actual outputs of the neural SR model on the test set; without such evidence the reported gains on complex expressions could be an artifact of mismatched training and test error statistics.
  2. [Abstract and Experiments section] The abstract states that 'extensive experiments and ablation studies show that EditSR substantially improves symbolic structure recovery,' yet the provided manuscript excerpt contains no quantitative results, baseline comparisons, dataset descriptions, or error metrics. This absence prevents assessment of whether the claimed improvements are load-bearing or merely incremental.
minor comments (2)
  1. [Methods] Notation for the state representation and edit actions should be formalized with explicit definitions (e.g., what constitutes the 'current syntactic state') to allow reproducibility.
  2. [Methods] The paper should clarify whether the state-transition algorithm uses random perturbations or model-generated errors when constructing the supervised chains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Methods section describing the state-transition algorithm and Rectifier training] The central claim that the pretrained Rectifier improves structure recovery on unseen expressions rests on the unverified assumption that the error distribution in the state-transition training chains matches the distribution of mistakes made by the first-layer neural model. The manuscript provides no comparison (e.g., statistics on error types, edit distances, or syntactic patterns) between the initial incorrect expressions used to build the chains and the actual outputs of the neural SR model on the test set; without such evidence the reported gains on complex expressions could be an artifact of mismatched training and test error statistics.

    Authors: We agree that a direct comparison of error distributions would strengthen the generalization argument. The state-transition algorithm is designed to produce diverse invalid starting points and valid edit sequences that reflect common autoregressive failure modes, but the manuscript does not currently include explicit statistics matching these to the first-layer model's test outputs. In the revision we will add this analysis (error-type frequencies, edit-distance histograms, and syntactic-pattern overlap) to the Experiments section to verify alignment. revision: yes

  2. Referee: [Abstract and Experiments section] The abstract states that 'extensive experiments and ablation studies show that EditSR substantially improves symbolic structure recovery,' yet the provided manuscript excerpt contains no quantitative results, baseline comparisons, dataset descriptions, or error metrics. This absence prevents assessment of whether the claimed improvements are load-bearing or merely incremental.

    Authors: The full manuscript contains a complete Experiments section with quantitative results, baseline comparisons, dataset descriptions, and error metrics that support the abstract claims. The excerpt supplied to the referee appears to have been limited to the abstract; the full paper provides all requested details. No change to the manuscript text is required on this point. revision: no

Circularity Check

0 steps flagged

No circularity; derivation relies on independent supervised pretraining of rectifier

full rationale

The paper presents EditSR as a two-layer architecture with a first-layer neural SR model and a second-layer Rectifier pretrained on state-transition chains generated by an explicit algorithm. No equations, derivations, or claims reduce the reported gains in structure recovery to a fitted quantity by construction, a self-referential definition, or a load-bearing self-citation chain. The rectification process is motivated and trained separately from the base model, with syntactic validity enforced by construction in the action space. This is a standard empirical ML contribution whose central claims rest on external benchmarks rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, mathematical axioms, or new postulated entities; the rectifier training relies on an unspecified state-transition algorithm whose internal assumptions are not detailed.

pith-pipeline@v0.9.1-grok · 5797 in / 1328 out tokens · 41061 ms · 2026-06-27T20:13:30.231194+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 17 canonical work pages · 2 internal anchors

  1. [1]

    Schmidt, H

    M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental data, science 324 (5923) (2009) 81–85

  2. [2]

    Udrescu, M

    S.-M. Udrescu, M. Tegmark, Ai feynman: A physics- inspired method for symbolic regression, Science advances 6 (16) (2020) eaay2631

  3. [3]

    J. R. Koza, Genetic programming as a means for program- ming computers by natural selection, Statistics and com- puting 4 (2) (1994) 87–112

  4. [4]

    O’Neill, Riccardo poli, william b

    M. O’Neill, Riccardo poli, william b. langdon, nicholas f. mcphee: A field guide to genetic programming: Lulu. com, 2008, 250 pp, isbn 978-1-4092-0073-4 (2009)

  5. [5]

    R. Poli, N. F. McPhee, Parsimony pressure made easy, in: Proceedings of the 10th annual conference on Genetic and evolutionary computation, 2008, pp. 1267–1274

  6. [6]

    Poli, A simple but theoretically-motivated method to control bloat in genetic programming, in: European Con- ference on Genetic Programming, Springer, 2003, pp

    R. Poli, A simple but theoretically-motivated method to control bloat in genetic programming, in: European Con- ference on Genetic Programming, Springer, 2003, pp. 204– 217

  7. [7]

    S. Luke, L. Panait, Lexicographic parsimony pressure, in: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, 2002, pp. 829–836

  8. [9]

    Moraglio, K

    A. Moraglio, K. Krawiec, C. G. Johnson, Geometric se- mantic genetic programming, in: International Conference on Parallel Problem Solving from Nature, Springer, 2012, pp. 21–31

  9. [10]

    Kommenda, M

    M. Kommenda, M. Affenzeller, G. Kronberger, S. M. Win- kler, Nonlinear least squares optimization of constants in symbolic regression, in: International Conference on Com- puter Aided Systems Theory, Springer, 2013, pp. 420–427

  10. [11]

    Burlacu, G

    B. Burlacu, G. Kronberger, M. Kommenda, Operon c++ an efficient genetic programming framework for symbolic regression, in: Proceedings of the 2020 genetic and evo- lutionary computation conference companion, 2020, pp. 1562–1570

  11. [12]

    La Cava, B

    W. La Cava, B. Burlacu, M. Virgolin, M. Kommenda, P. Orzechowski, F. O. de França, Y . Jin, J. H. Moore, Con- temporary symbolic regression methods and their relative performance, Advances in neural information processing systems 2021 (DB1) (2021) 1

  12. [13]

    F. O. de Franca, M. Virgolin, M. Kommenda, M. Ma- jumder, M. Cranmer, G. Espada, L. Ingelse, A. Fonseca, M. Landajuela, B. Petersen, et al., Srbench++: principled benchmarking of symbolic regression with domain-expert interpretation, IEEE transactions on evolutionary computa- tion (2024)

  13. [14]

    Lample, F

    G. Lample, F. Charton, Deep learning for symbolic mathe- matics, arXiv preprint arXiv:1912.01412 (2019)

  14. [15]

    Extrapolation and learning equations

    G. Martius, C. H. Lampert, Extrapolation and learning equations, arXiv preprint arXiv:1610.02995 (2016)

  15. [16]

    B. K. Petersen, M. Landajuela, T. N. Mundhenk, C. P. Santiago, S. K. Kim, J. T. Kim, Deep symbolic regression: Recovering mathematical expressions from data via risk- seeking policy gradients, arXiv preprint arXiv:1912.04871 (2019)

  16. [17]

    Mundhenk, M

    T. Mundhenk, M. Landajuela, R. Glatt, C. P. Santiago, B. K. Petersen, et al., Symbolic regression via deep rein- forcement learning enhanced genetic programming seed- ing, Advances in Neural Information Processing Systems 34 (2021) 24912–24923

  17. [18]

    Biggio, T

    L. Biggio, T. Bendinelli, A. Neitz, A. Lucchi, G. Paras- candolo, Neural symbolic regression that scales, in: Inter- national conference on machine learning, Pmlr, 2021, pp. 936–945

  18. [19]

    Valipour, B

    M. Valipour, B. You, M. Panju, A. Ghodsi, Symbolicgpt: A generative transformer model for symbolic regression, arXiv preprint arXiv:2106.14131 (2021). 22

  19. [20]

    Kamienny, S

    P.-A. Kamienny, S. d’Ascoli, G. Lample, F. Charton, End- to-end symbolic regression with transformers, Advances in Neural Information Processing Systems 35 (2022) 10269– 10281

  20. [21]

    Vastl, J

    M. Vastl, J. Kulhánek, J. Kubalík, E. Derner, R. Babuška, Symformer: End-to-end symbolic regression using transformer-based architecture, IEEE Access 12 (2024) 37840–37849

  21. [22]

    Landajuela, C

    M. Landajuela, C. S. Lee, J. Yang, R. Glatt, C. P. San- tiago, I. Aravena, T. Mundhenk, G. Mulcahy, B. K. Pe- tersen, A unified framework for deep symbolic regression, Advances in Neural Information Processing Systems 35 (2022) 33985–33998

  22. [23]

    Shojaee, K

    P. Shojaee, K. Meidani, A. Barati Farimani, C. Reddy, Transformer-based planning for symbolic regression, Ad- vances in Neural Information Processing Systems 36 (2023) 45907–45919

  23. [24]

    Meidani, P

    K. Meidani, P. Shojaee, C. K. Reddy, A. B. Farimani, Snip: Bridging mathematical symbolic and numeric realms with unified pre-training, arXiv preprint arXiv:2310.02227 (2023)

  24. [25]

    J. Gu, C. Wang, J. Zhao, Levenshtein transformer, Ad- vances in neural information processing systems 32 (2019) 11179–11189

  25. [26]

    W. Xu, M. Carpuat, Editor: An edit-based transformer with repositioning for neural machine translation with soft lexical constraints, Transactions of the Association for Computational Linguistics 9 (2021) 311–328

  26. [27]

    M. Reid, G. Neubig, Learning to model editing processes, in: Findings of the Association for Computational Linguis- tics: EMNLP 2022, 2022, pp. 3822–3832

  27. [28]

    M. Reid, V . J. Hellendoorn, G. Neubig, Diffuser: Dis- crete diffusion via edit-based reconstruction, arXiv preprint arXiv:2210.16886 (2022)

  28. [29]

    arXiv preprint arXiv:2506.09018 , year=

    M. Havasi, B. Karrer, I. Gat, R. T. Chen, Edit flows: Flow matching with edit operations, arXiv preprint arXiv:2506.09018 (2025)

  29. [30]

    Austin, D

    J. Austin, D. D. Johnson, J. Ho, D. Tarlow, R. Van Den Berg, Structured denoising diffusion models in dis- crete state-spaces, Advances in neural information process- ing systems 34 (2021) 17981–17993

  30. [31]

    J. Liu, W. Li, L. Yu, M. Wu, L. Sun, W. Li, Y . Li, Snr: Symbolic network-based rectifiable learning framework for symbolic regression, Neural networks 165 (2023) 1021– 1034

  31. [32]

    Karras, M

    T. Karras, M. Aittala, T. Aila, S. Laine, Elucidating the design space of diffusion-based generative models, Ad- vances in neural information processing systems 35 (2022) 26565–26577

  32. [33]

    Chang, H

    H. Chang, H. Zhang, L. Jiang, C. Liu, W. T. Freeman, Maskgit: Masked generative image transformer, in: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11315–11325

  33. [34]

    R. T. Tymkow, B. D. Schnapp, M. Valipour, A. Ghodshi, Symbolic-diffusion: Deep learning based symbolic regres- sion with d3pm discrete token diffusion, arXiv preprint arXiv:2510.07570 (2025)

  34. [35]

    Bastiani, R

    Z. Bastiani, R. M. Kirby, J. Hochhalter, S. Zhe, Diffusion-based symbolic regression, arXiv preprint arXiv:2505.24776 (2025)

  35. [36]

    J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilis- tic models, Advances in neural information processing systems 33 (2020) 6840–6851

  36. [37]

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, B. Poole, Score-based generative modeling through stochastic differential equations, arXiv preprint arXiv:2011.13456 (2020)

  37. [38]

    Hoogeboom, D

    E. Hoogeboom, D. Nielsen, P. Jaini, P. Forré, M. Welling, Argmax flows and multinomial diffusion: Learning cate- gorical distributions, Advances in neural information pro- cessing systems 34 (2021) 12454–12465

  38. [39]

    McConaghy, Ffx: Fast, scalable, deterministic symbolic regression technology, in: Genetic Programming Theory and Practice IX, Springer, 2011, pp

    T. McConaghy, Ffx: Fast, scalable, deterministic symbolic regression technology, in: Genetic Programming Theory and Practice IX, Springer, 2011, pp. 235–260

  39. [40]

    F. O. de Franca, G. S. I. Aldeia, Interaction–transformation evolutionary algorithm for symbolic regression, Evolution- ary computation 29 (3) (2021) 367–390

  40. [41]

    Arnaldo, K

    I. Arnaldo, K. Krawiec, U.-M. O’Reilly, Multiple regres- sion genetic programming, in: Proceedings of the 2014 annual conference on genetic and evolutionary computa- tion, 2014, pp. 879–886

  41. [42]

    Virgolin, T

    M. Virgolin, T. Alderliesten, C. Witteveen, P. A. Bosman, Improving model-based genetic programming for symbolic regression of small expressions, Evolutionary computation 29 (2) (2021) 211–237

  42. [43]

    Cranmer, Pysr: high-performance symbolic regression in python and julia, Astrophysics Source Code Library (2024) ascl–2409

    M. Cranmer, Pysr: high-performance symbolic regression in python and julia, Astrophysics Source Code Library (2024) ascl–2409

  43. [44]

    Sahoo, C

    S. Sahoo, C. Lampert, G. Martius, Learning equations for extrapolation and control, in: International conference on machine learning, Pmlr, 2018, pp. 4442–4450

  44. [45]

    Y . Tian, W. Zhou, M. Viscione, H. Dong, D. S. Kammer, O. Fink, Interactive symbolic regression with co-design mechanism through offline reinforcement learning, Nature Communications 16 (1) (2025) 3930. 23

  45. [46]

    Y . Li, W. Li, L. Yu, M. Wu, J. Liu, W. Li, M. Hao, Dis- covering mathematical formulas from data via gpt-guided monte carlo tree search, Expert Systems with Applications 281 (2025) 127591

  46. [47]

    Xiang, K

    Z. Xiang, K. Ashen, X. Qian, X. Qian, Graph-based sym- bolic regression with invariance and constraint encoding, in: The Thirty-ninth Annual Conference on Neural Infor- mation Processing Systems, 2025

  47. [48]

    Huang, D

    Z. Huang, D. Z. Huang, T. Xiao, D. Ma, Z. Ming, H. Shi, Y . Wen, Improving monte carlo tree search for symbolic regression, arXiv preprint arXiv:2509.15929 (2025)

  48. [49]

    K. Ruan, Y . Xu, Z.-F. Gao, Y . Liu, Y . Guo, J.-R. Wen, H. Sun, Discovering physical laws with parallel symbolic enumeration, Nature Computational Science 6 (1) (2026) 53–66

  49. [50]

    Y . Li, J. Liu, M. Wu, L. Yu, W. Li, X. Ning, W. Li, M. Hao, Y . Deng, S. Wei, Mmsr: symbolic regression is a multi- modal information fusion task, Information Fusion 114 (2025) 102681

  50. [51]

    D. Li, J. Yin, J. Xu, X. Li, J. Zhang, Visymre: Vision multimodal symbolic regression, Neural Networks (2026) 109017

  51. [52]

    Shojaee, K

    P. Shojaee, K. Meidani, S. Gupta, A. B. Farimani, C. K. Reddy, Llm-sr: Scientific equation discovery via pro- gramming with large language models, arXiv preprint arXiv:2404.18400 (2024)

  52. [53]

    Grayeli, A

    A. Grayeli, A. Sehgal, O. Costilla-Reyes, M. Cranmer, S. Chaudhuri, Symbolic regression with a learned con- cept library, Advances in Neural Information Processing Systems 37 (2024) 44678–44709

  53. [54]

    arXiv preprint arXiv:2504.10415 , year=

    P. Shojaee, N.-H. Nguyen, K. Meidani, A. B. Farimani, K. D. Doan, C. K. Reddy, Llm-srbench: A new bench- mark for scientific equation discovery with large language models, arXiv preprint arXiv:2504.10415 (2025)

  54. [55]

    Z. Yu, J. Ding, Y . Li, D. Jin, Symbolic regression via mdlformer-guided search: from minimizing prediction error to minimizing description length, arXiv preprint arXiv:2411.03753 (2024)

  55. [56]

    Scholl, K

    P. Scholl, K. Bieker, H. Hauger, G. Kutyniok, Parfam– (neural guided) symbolic regression via continuous global optimization, in: The Thirteenth International Conference on Learning Representations, 2025

  56. [57]

    J. Liu, M. Wu, L. Yu, W. Li, W. Li, Y . Li, M. Hao, Y . Deng, S. Wei, Camo: Capturing the modularity by end-to-end models for symbolic regression, Knowledge-Based Sys- tems 309 (2025) 112747

  57. [58]

    Plug and play language models: A simple approach to controlled text generation,

    S. Dathathri, A. Madotto, J. Lan, J. Hung, E. Frank, P. Molino, J. Yosinski, R. Liu, Plug and play language models: A simple approach to controlled text generation, arXiv preprint arXiv:1912.02164 (2019)

  58. [59]

    L. Qin, S. Welleck, D. Khashabi, Y . Choi, Cold decoding: Energy-based constrained text generation with langevin dynamics, Advances in Neural Information Processing Systems 35 (2022) 9538–9551

  59. [60]

    Kumagai, I

    K. Kumagai, I. Kobayashi, D. Mochihashi, H. Asoh, T. Nakamura, T. Nagai, Human-like natural language gen- eration using monte carlo tree search, in: Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation, 2016, pp. 11–18

  60. [61]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

  61. [62]

    J. Lee, Y . Lee, J. Kim, A. Kosiorek, S. Choi, Y . W. Teh, Set transformer: A framework for attention-based permutation- invariant neural networks, in: International conference on machine learning, PMLR, 2019, pp. 3744–3753

  62. [63]

    C. R. Qi, H. Su, K. Mo, L. J. Guibas, Pointnet: Deep learn- ing on point sets for 3d classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660

  63. [64]

    Lalande, Y

    F. Lalande, Y . Matsubara, N. Chiba, T. Taniai, R. Igarashi, Y . Ushiku, A transformer model for symbolic re- gression towards scientific discovery, arXiv preprint arXiv:2312.04070 (2023)

  64. [65]

    S. H. Strogatz, Nonlinear dynamics and chaos: with ap- plications to physics, biology, chemistry, and engineering (studies in nonlinearity), V ol. 1, Westview press, 2001

  65. [66]

    G. S. Imai Aldeia, H. Zhang, G. Bomarito, M. Cranmer, A. Fonseca, B. Burlacu, W. G. La Cava, F. O. de França, Call for action: towards the next generation of symbolic regression benchmark, in: Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2025, pp. 2529–2538

  66. [67]

    Kartelj, M

    A. Kartelj, M. Djukanovi´c, Rils-rols: robust symbolic re- gression via iterated local search and ordinary least squares, Journal of Big Data 10 (1) (2023) 71

  67. [68]

    McDermott, D

    J. McDermott, D. R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaskowski, K. Krawiec, R. Harper, K. De Jong, et al., Genetic programming needs better benchmarks, in: Proceedings of the 14th annual con- ference on Genetic and evolutionary computation, 2012, pp. 791–798

  68. [69]

    Matsubara, N

    Y . Matsubara, N. Chiba, R. Igarashi, Y . Ushiku, Srsd: Re- thinking datasets of symbolic regression for scientific dis- covery, in: NeurIPS 2022 AI for Science: Progress and Promises, 2022. 24

  69. [70]

    N. Q. Uy, N. X. Hoai, M. O’Neill, R. I. McKay, E. Galván- López, Semantically-based crossover in genetic program- ming: application to real-valued symbolic regression, Ge- netic Programming and Evolvable Machines 12 (2) (2011) 91–119

  70. [71]

    Keijzer, Improving symbolic regression with interval arithmetic and linear scaling, in: European conference on genetic programming, Springer, 2003, pp

    M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling, in: European conference on genetic programming, Springer, 2003, pp. 70–82

  71. [72]

    M. F. Korns, Accuracy in symbolic regression, in: Genetic Programming Theory and Practice IX, Springer, 2011, pp. 129–151

  72. [73]

    Y . Jin, W. Fu, J. Kang, J. Guo, J. Guo, Bayesian symbolic regression, arXiv preprint arXiv:1910.08892 (2019). 25 Appendix A. Details of Tagger and Editor The Tagger and Editor share the same dataset encoding h, but they play different roles in the rectification loop. The Tagger is responsible for predicting the edit position and action. Given the current...