arxiv: 2605.14288 · v1 · submitted 2026-05-14 · 🧮 math.NT

Recognition: 2 theorem links

· Lean Theorem

How Twist Class Redundancy Drives the Prediction of Traces of Frobenius of Elliptic Curves

Angelica Babei , Ujjawal Shah , Malick Kebe

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:39 UTC · model grok-4.3

classification 🧮 math.NT

keywords elliptic curvesquadratic twistsFrobenius tracemachine learningnumber theorydataset redundancybenchmark datasetarithmetic invariants

0 comments

The pith

Redundancy within quadratic twist classes of elliptic curves suffices for highly accurate machine learning predictions of their Frobenius traces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that standard datasets for predicting the trace of Frobenius on elliptic curves contain many repeated examples that are quadratic twists of one another. This built-in repetition within twist classes lets models reach high accuracy by memorizing patterns across those duplicates rather than extracting independent arithmetic information. The authors therefore release a new benchmark dataset limited to a single representative from each twist class. A sympathetic reader would care because earlier strong results may rest on this dataset artifact instead of genuine mathematical discovery.

Core claim

The underlying datasets for predicting traces of Frobenius contain significant redundancy within quadratic twist classes, and this redundancy alone is sufficient to produce highly accurate predictions. The authors introduce a benchmark dataset consisting exclusively of unique twist class representatives.

What carries the argument

Redundancy inside quadratic twist classes, which supplies repeated examples that models can exploit to achieve high accuracy without learning new arithmetic relations.

If this is right

Models trained on standard datasets can reach high accuracy without capturing genuine arithmetic features of elliptic curves.
Future machine-learning work on elliptic-curve invariants must be tested on the unique-twist benchmark to confirm it learns properties beyond twist relations.
Accuracy on the new benchmark is expected to fall, directly quantifying how much prior performance relied on redundancy.
Any claimed advance in predicting arithmetic invariants must now be re-evaluated on deduplicated data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Analogous redundancy may exist in other number-theoretic datasets built from families closed under twists or isogenies.
Dataset audits for class-level repetitions should become standard before reporting predictive success in algebraic geometry or number theory.
The unique-twist benchmark could be extended to additional invariants or to curves over number fields to test whether the redundancy effect generalizes.

Load-bearing premise

The drop in model performance on the unique-twist dataset is caused by removal of redundancy rather than by shifts in data distribution or limits on model capacity.

What would settle it

Train an identical model on the unique-twist-class benchmark and check whether prediction accuracy for traces of Frobenius stays as high as on the original dataset; a large drop supports the claim while little or no drop falsifies it.

Figures

Figures reproduced from arXiv: 2605.14288 by Angelica Babei, Malick Kebe, Ujjawal Shah.

**Figure 1.** Figure 1: MCC of the sign-based twist class matching algorithm (Algorithm 1) for predicting ap(E) across primes p ∈ {2, 3, . . . , 97}, using the exact twist hash to group curves. 3.2. Approximating the twist hash partition from (|ap1 |, . . . , |apk |). We seek to closely approximate the exact twist class grouping. To achieve this, we use the absolute values of the traces of Frobenius (|ap1 (E)|, . . . , |apk (E)|… view at source ↗

**Figure 2.** Figure 2: Adjusted Rand Index (ARI) between the twist-hash partition and the partition of ECQ6 induced by tuples of absolute values (|ap1 (E)|, . . . , |apk (E)|) as a function of k. The green curves uses the k largest primes below 100 while the blue uses the k smallest. The last k partition recovers the twist-hash classes substantially better, with ARI peaking near 0.85 on k ∈ [7, 16] while the first-k partition sa… view at source ↗

**Figure 3.** Figure 3: ARI, Homegeneity, Completeness and V-measure of the partitioning using the tuples of absolute values (|ap1 (E)|, . . . , |apk (E)|). The range of k ∈ [7, 16] yields the maximum score for all these metrics. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of proxy models to the twist-hash model. Algorithm 2 differs from Algorithm 1 only in the key used to group curves. In place of the twist hash h(E), it uses the proxy key k(E) = |ap1 (E)|, . . . , |apk (E)| formed from the absolute values of the traces at the k largest primes below 100. Steps 0, 1 and 2, as listed in Section 3.1, are otherwise unchanged, with proxy keys taking the role of twis… view at source ↗

**Figure 5.** Figure 5: Agreement and disagreement between the twist model and proxy model (k = 8) 4. Towards Generalization Beyond Twist Classes 4.1. Evaluation of trained transformer models on unseen twist classes. The previous sections demonstrate that twist classes provide a massive predictive advantage. However, this leaves open the question of whether the transformer models’ predictions were based on the presence of multipl… view at source ↗

read the original abstract

Recent interest in applying machine learning methods to predict invariants of mathematical objects has yielded models with surprisingly strong performance, including those predicting traces of Frobenius for elliptic curves. We demonstrate that the underlying datasets contain significant redundancy within quadratic twist classes, which alone is sufficient to produce highly accurate predictions. To ensure future models capture new arithmetic properties rather than potentially exploiting these dataset artifacts, we introduce a benchmark dataset consisting exclusively of unique twist class representatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags quadratic twist redundancy in elliptic curve datasets as the main reason prior ML models predicted traces of Frobenius so well, and releases a unique-representative benchmark to fix it.

read the letter

The main point is that public elliptic curve datasets contain many quadratic twists of the same curve, and this repetition by itself lets models hit high accuracy on trace-of-Frobenius prediction. The authors show the effect and supply a new benchmark that keeps only one representative per twist class. That benchmark is the concrete new item; earlier ML-for-number-theory papers did not isolate this artifact or release a cleaned version for testing against it. Releasing the dataset is useful because it gives the community a practical way to check whether a model is learning arithmetic structure or just exploiting repeats. The observation itself is plausible and directly addresses a methodological hole that anyone training on these curves should care about. The soft spot is that picking one representative per twist class also changes the joint distribution of conductors, discriminants, and ranks. The performance drop could therefore come from that broader shift rather than from the loss of duplicates alone. The abstract does not describe a control that holds the distribution fixed while removing only the intra-class repeats, so the causal claim rests on the assumption that redundancy is the dominant factor. Without the full tables or splits it is hard to judge how large the drop actually is or how stable it is across architectures. This paper is for people working on machine learning applied to arithmetic geometry. It will not reshape the wider field, but it gives a concrete check that future work should use. The concern is real and the benchmark is new, so the paper deserves a serious referee even if the attribution needs a tighter control experiment to hold up.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that machine learning models for predicting traces of Frobenius on elliptic curves achieve high accuracy primarily by exploiting redundancy within quadratic twist classes in existing public datasets. The authors demonstrate this effect empirically and introduce a new benchmark dataset consisting exclusively of unique twist-class representatives to encourage models that learn genuine arithmetic properties rather than dataset artifacts.

Significance. If the central claim holds after addressing the noted concerns, the work would be significant for the intersection of machine learning and number theory. It identifies a concrete dataset artifact (twist-class repetition) that can produce misleadingly strong performance and supplies an externally defined benchmark to mitigate it. This strengthens the foundation for reproducible and falsifiable ML experiments on elliptic curve invariants, particularly by highlighting the need for distribution-controlled ablations in future studies.

major comments (2)

[§4 and §5] §4 (Benchmark Dataset Construction) and §5 (Experimental Results): The performance drop on the unique-twist-class dataset is presented as evidence that redundancy drives accuracy, but the construction necessarily alters the joint distribution of conductors, discriminants, and ranks. No control experiment is reported that holds this distribution fixed while removing only intra-class duplicates, leaving open the possibility that the gap arises from distributional shift alone.
[§5.1] §5.1 (Quantitative Comparisons): The claim that twist-class redundancy is 'sufficient to produce highly accurate predictions' requires explicit quantification of how much of the original accuracy is recovered when duplicates are reintroduced while preserving the new benchmark's distribution; the current ablation does not isolate this effect.

minor comments (2)

[Abstract] The abstract should include the specific accuracy metrics (e.g., R² or MAE values) on both the original and unique-twist datasets to make the redundancy effect immediately quantifiable.
[§3] Notation for twist-class equivalence and the selection of representatives should be defined more formally, perhaps with a short equation or pseudocode in §3.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which help clarify how to better isolate the role of twist-class redundancy from potential distributional effects. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4 and §5] §4 (Benchmark Dataset Construction) and §5 (Experimental Results): The performance drop on the unique-twist-class dataset is presented as evidence that redundancy drives accuracy, but the construction necessarily alters the joint distribution of conductors, discriminants, and ranks. No control experiment is reported that holds this distribution fixed while removing only intra-class duplicates, leaving open the possibility that the gap arises from distributional shift alone.

Authors: We acknowledge that the unique-twist-class benchmark construction necessarily changes the joint distribution of conductors, discriminants, and ranks. Our primary aim was to supply an externally defined benchmark free of intra-class repetition. To isolate redundancy from distributional shift, we will add a control experiment in the revised manuscript: subsample the original dataset to match the conductor, discriminant, and rank distribution of the unique-twist benchmark while retaining duplicates, then compare model performance against the unique-twist results. revision: partial
Referee: [§5.1] §5.1 (Quantitative Comparisons): The claim that twist-class redundancy is 'sufficient to produce highly accurate predictions' requires explicit quantification of how much of the original accuracy is recovered when duplicates are reintroduced while preserving the new benchmark's distribution; the current ablation does not isolate this effect.

Authors: We agree that a direct quantification of accuracy recovery when reintroducing duplicates under the fixed benchmark distribution would strengthen the claim. In the revised manuscript we will add this controlled experiment: start from the unique-twist benchmark, reintroduce duplicates to restore the original redundancy levels while preserving the benchmark distribution, and report the resulting performance to measure the isolated contribution of redundancy. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical dataset observation is self-contained

full rationale

The paper's core demonstration—that quadratic twist class redundancy in public elliptic curve datasets suffices for high-accuracy trace-of-Frobenius predictions—is an empirical comparison between the original dataset and a new benchmark of unique twist-class representatives. This comparison relies on the externally defined arithmetic equivalence relation of quadratic twists rather than any fitted parameter, self-referential definition, or load-bearing self-citation. No derivation step reduces by construction to its own inputs; the performance drop is presented as an observation, not a mathematical identity. The construction of the unique-representative benchmark is independent of the model's training procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical study of dataset artifacts; it introduces no free parameters, no new axioms, and no invented mathematical entities. The central claim rests on the standard definition of quadratic twist equivalence already present in the elliptic-curve literature.

pith-pipeline@v0.9.0 · 5368 in / 1111 out tokens · 24889 ms · 2026-05-15T02:39:05.373601+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We demonstrate that the underlying datasets contain significant redundancy within quadratic twist classes, which alone is sufficient to produce highly accurate predictions.
IndisputableMonolith.Foundation.AlexanderDuality alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we introduce a benchmark dataset consisting exclusively of unique twist class representatives

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Amir, Y.-H

M. Amir, Y.-H. He, K.-H. Lee, T. Oliver, and E. Sultanow. Machine learning class numbers of real quadratic fields.https://arxiv.org/pdf/2209.09283, 2022. arXiv:math.NT:2209.09283

work page arXiv 2022
[2]

Frobenius traces for a set of (quadratic) twist classes of elliptic curves of conductor up to 10 7, May 2026

Angelica Babei. Frobenius traces for a set of (quadratic) twist classes of elliptic curves of conductor up to 10 7, May 2026

work page 2026
[3]

Banwait, AJ Fing, Xiaoyu Huang, and Deependra Singh

Angelica Babei, Barinder S. Banwait, AJ Fing, Xiaoyu Huang, and Deependra Singh. Machine learning ap- proaches to the shafarevich-tate group of elliptic curves.Advances in Theoretical and Mathematical Physics, 29(8):pp. 2353–2379, 2025

work page 2025
[4]

Learning euler factors of elliptic curves.Advances in Theoretical and Math- ematical Physics, 29(8):2327–2351, 2025

Angelica Babei, Fran¸ cois Charton, Edgar Costa, Xiaoyu Huang, Kyu-Hwan Lee, David Lowry-Duda, Ashvni Narayanan, and Alexey Pozdnyakov. Learning euler factors of elliptic curves.Advances in Theoretical and Math- ematical Physics, 29(8):2327–2351, 2025

work page 2025
[5]

Booker, Min Lee, and David Lowry-Duda

Jonathan Bober, Andrew R. Booker, Min Lee, and David Lowry-Duda. Murmurations of modular forms in the weight aspect.http://arxiv.org/abs/2310.07746v1, 2023. arXiv:math.NT:2310.07746v1

work page arXiv 2023
[6]

Booker, Min Lee, David Lowry-Duda, Andrei Seymour-Howell, and Nina Zubrilina

Andrew R. Booker, Min Lee, David Lowry-Duda, Andrei Seymour-Howell, and Nina Zubrilina. Murmurations of Maass forms.http://arxiv.org/abs/2409.00765v1, 2024. arXiv:math.NT:2409.00765v1

work page arXiv 2024
[7]

Booker, Jeroen Sijsling, Andrew V

Andrew R. Booker, Jeroen Sijsling, Andrew V. Sutherland, John Voight, and Dan Yasaki. A database of genus-2 curves over the rational numbers.LMS Journal of Computation and Mathematics, 19(A):235–254, 2016

work page 2016
[8]

The Magma algebra system

Wieb Bosma, John Cannon, and Catherine Playoust. The Magma algebra system. I. The user language.J. Symbolic Comput., 24(3-4):235–265, 1997. Computational algebra and number theory (London, 1993)

work page 1997
[9]

Murmurations of Mestre-Nagao sums.http://arxiv

Zvonimir Bujanovi´ c, Matija Kazalicki, and Lukas Novak. Murmurations of Mestre-Nagao sums.http://arxiv. org/abs/2403.17626v1, 2024. arXiv:math.NT:2403.17626v1. 12

work page arXiv 2024
[10]

Frobenious traces for a set of isogeny classes of elliptic curves of conductor up to 10 6, June 2025

Edgar Costa. Frobenious traces for a set of isogeny classes of elliptic curves of conductor up to 10 6, June 2025

work page 2025
[11]

Murmurations and explicit formulas.http://arxiv.org/abs/2306.10425v2, 2023

Alex Cowan. Murmurations and explicit formulas.http://arxiv.org/abs/2306.10425v2, 2023. arXiv:math.NT:2306.10425v2

work page arXiv 2023
[12]

Murmurations and ratios conjectures.http://arxiv.org/abs/2408.12723v1, 2024

Alex Cowan. Murmurations and ratios conjectures.http://arxiv.org/abs/2408.12723v1, 2024. arXiv:math.NT:2408.12723v1

work page arXiv 2024
[13]

Comparing two k-category assignments by a k-category correlation coefficient.Computational biology and chemistry, 28(5-6):367–374, 2004

Jan Gorodkin. Comparing two k-category assignments by a k-category correlation coefficient.Computational biology and chemistry, 28(5-6):367–374, 2004

work page 2004
[14]

Murmurations of elliptic curves.Exper- imental Mathematics, pages 1–13, 2024

Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Alexey Pozdnyakov. Murmurations of elliptic curves.Exper- imental Mathematics, pages 1–13, 2024

work page 2024
[15]

Comparing partitions.Journal of Classification, 2(1):193–218, 1985

Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of Classification, 2(1):193–218, 1985

work page 1985
[16]

Ranks of elliptic curves and deep neural networks.Res

Matija Kazalicki and Domagoj Vlah. Ranks of elliptic curves and deep neural networks.Res. Number Theory, 9(3):Paper No. 53, 21, 2023

work page 2023
[17]

Machines Learn Number Fields, But How? The Case of Galois Groups.arXiv preprint arXiv:2508.06670, 2025

K.-H. Lee and S. Lee. Machines learn number fields, but how? the case of galois groups.https://arxiv.org/ pdf/2508.06670, 2025. arXiv:math.NT:2508.06670

work page arXiv 2025
[18]

Murmurations of Dirichlet characters.International Mathematics Research Notices, 2025(1), 2025

Kyu-Hwan Lee, Thomas Oliver, and Alexey Pozdnyakov. Murmurations of Dirichlet characters.International Mathematics Research Notices, 2025(1), 2025

work page 2025
[19]

The L-functions and modular forms database.https://www.lmfdb.org, 2024

The LMFDB Collaboration. The L-functions and modular forms database.https://www.lmfdb.org, 2024. [On- line; accessed 29 December 2024]

work page 2024
[20]

Distribution of local signs of modular forms and murmurations of Fourier coefficients.http: //arxiv.org/abs/2409.02338v1, 2024

Kimball Martin. Distribution of local signs of modular forms and murmurations of Fourier coefficients.http: //arxiv.org/abs/2409.02338v1, 2024. arXiv:math.NT:2409.02338v1

work page arXiv 2024
[21]

Comparison of the predicted and observed secondary structure of t4 phage lysozyme

Brian W Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2):442–451, 1975

work page 1975
[22]

V-measure: A conditional entropy-based external cluster evaluation measure

Andrew Rosenberg and Julia Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. InProceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 410–420, 2007

work page 2007
[23]

Sutherland

Will Sawin and Andrew V. Sutherland. Murmurations for elliptic curves ordered by height.arXiv preprint, 2025

work page 2025
[24]

Silverman.The arithmetic of elliptic curves, volume 106 ofGraduate Texts in Mathematics

Joseph H. Silverman.The arithmetic of elliptic curves, volume 106 ofGraduate Texts in Mathematics. Springer, Dordrecht, second edition, 2009

work page 2009
[25]

Sutherland

Andrew V. Sutherland. Magma repository.https://github.com/AndrewVSutherland/Magma, 2023

work page 2023
[26]

A set of isogeny classes of elliptic curves of conductor up to 10 8, September 2024

Andrew Victor Sutherland. A set of isogeny classes of elliptic curves of conductor up to 10 8, September 2024

work page 2024
[27]

Murmurations.http://arxiv.org/abs/2310.07681v1, 2023

Nina Zubrilina. Murmurations.http://arxiv.org/abs/2310.07681v1, 2023. arXiv:math.NT:2310.07681v1. 13 Appendix Table 4.2.Comparison of Twist Hash and Proxy (k= 8) Models Prime# of GoodConductors Twist Hash Model Proxy Model (k= 8)ProbabilisticallyPredicted∗ OverallCorrect∗ DeterministicMCC OverallMCCProbabilisticallyPredicted∗ OverallCorrect∗ Deterministic...

work page arXiv 2023