pith. machine review for the scientific record. sign in

arxiv: 2604.04155 · v1 · submitted 2026-04-05 · 💻 cs.LG · cs.IT· math.IT· q-bio.QM· stat.ML

Recognition: 3 theorem links

· Lean Theorem

The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:47 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.ITq-bio.QMstat.ML
keywords geometric alignment taxdiscrete tokenizationcontinuous geometryscientific foundation modelsgeometric distortionrate-distortion theorybiological modelsmutual information
0
0 comments X

The pith

Discrete tokenization in scientific foundation models imposes up to 8.5 times more geometric distortion than continuous alternatives on identical encoders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that forcing continuous physical and biological manifolds through discrete categorical bottlenecks creates an intrinsic Geometric Alignment Tax that prevents faithful representation. Controlled ablations on synthetic dynamical systems show that swapping cross-entropy for a continuous head cuts distortion by up to 8.5x, while learned codebooks display a non-monotonic double bind in which finer quantization improves reconstruction yet harms geometry. Evaluations of fourteen biological models reveal three consistent failure regimes and confirm that no current architecture simultaneously achieves low distortion, high mutual information, and global coherence. These results matter because accurate preservation of continuous geometry is required for reliable modeling of dynamical systems in biology and physics.

Core claim

The root cause is the Geometric Alignment Tax, an intrinsic cost of discrete tokenization. On identical encoders, continuous objectives produce at most 1.3x architectural variation while discrete tokenization produces 3,000x variation. Learned codebooks worsen geometric fidelity with finer quantization despite better reconstruction. Real models fall into Local-Global Decoupling, Representational Compression, or Geometric Vacuity, and Evo 2's reverse-complement robustness reflects conserved composition rather than learned symmetry.

What carries the argument

The Geometric Alignment Tax: the measurable cost of routing continuous manifolds through discrete categorical bottlenecks, quantified by rate-distortion curves and MINE mutual-information estimates.

If this is right

  • Continuous objectives make architecture choice nearly irrelevant (1.3x spread) while discrete objectives amplify architectural differences by three orders of magnitude.
  • Learned codebooks create a non-monotonic trade-off: finer quantization improves reconstruction but increases geometric distortion.
  • Existing biological foundation models fall into one of three regimes: Local-Global Decoupling, Representational Compression, or Geometric Vacuity.
  • No model reaches the joint optimum of low distortion, high mutual information, and global coherence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Scientific applications may require architectures that avoid discrete bottlenecks entirely rather than tuning tokenization granularity.
  • The observed divergence under discrete objectives suggests that downstream tasks relying on geometric relationships, such as molecular dynamics or trajectory prediction, will inherit systematic errors.
  • Hybrid representations that combine limited discrete tokens with continuous refinement layers could be tested to reduce the tax while retaining some discrete benefits.
  • The same rate-distortion and coherence diagnostics could be applied to foundation models in chemistry or climate science to check for analogous alignment failures.

Load-bearing premise

The synthetic dynamical systems and rate-distortion/MINE metrics used in the evaluations faithfully capture the geometric properties and alignment failures present in real biological and physical data.

What would settle it

A discrete-tokenized model that simultaneously achieves low geometric distortion, high mutual information, and global coherence on real DNA sequences or physical trajectories would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.04155 by Prashant C. Raju.

Figure 1
Figure 1. Figure 1: A. Track A vs. Track B Lipschitz profiles: smooth arcs (continuous physics) vs. divergent, multi-scale fracture (discrete biology). B. Continuous vs. discrete Procrustes D across architectures on the Lorenz dataset at 1% noise. All continuous conditions cluster near zero; discrete conditions span an order of magnitude. C. VQ double bind: reconstruction MSE (decreasing) vs. Procrustes D (non-monotone) vs. c… view at source ↗
Figure 2
Figure 2. Figure 2: A. ESM-2 composite stability (blue, left axis) vs. parameters, with Procrustes reduction overlaid (orange, right axis). Stability declines monotonically from 8M to 3B; the 15B “recovery” is unmasked by the simultaneous spike in Procrustes reduction, revealing global manifold drift rather than genuine geometric improvement. B. Conceptual illustration of the two failure modes. Ground Truth: the manifold is a… view at source ↗
Figure 3
Figure 3. Figure 3: A. Texture Hypothesis Test. RC RDM similarity across four conditions for Evo 2 (7B, 8K context, 10,000 sequences). Dinuc-shuffled real DNA (per-sequence k-mer counts preserved) recovers 97% of the real-random gap; texture￾matched Markov (population-level statistics only) recovers 3%. B. The RC Dissociation explained. On synthetic DNA (left), discrete tokens destroy the A↔T / C↔G bijection entirely (RDM ∼ 0… view at source ↗
Figure 4
Figure 4. Figure 4: (A) Excess MI (bias-corrected) across the three failure regimes. ProtMamba falls below zero (Geometric Vacuity), ESM-1b and OpenFold show large positive values (Representational Compression), and Evo 2 is modest and positive (Local-Global Decoupling). Random baselines sit at zero by construction. (B) Regime I: Evo 2 global vs. local MI. The flat curve across 64× context expansion confirms informational sha… view at source ↗
Figure 5
Figure 5. Figure 5: Effect of embedding-level RCCR on DNABERT-2 (117M). (A) Training loss converges rapidly (99.4% reduction in 10 epochs). (B) Per-sequence RC cosine gap collapses from 0.041 to 0.000: perfect pointwise consistency. (C) Despite this, Procrustes disparity between forward and RC embedding matrices increases 91% (0.76 → 1.45): population-level geometric structure degrades. (D) Shesha composite stability by pertu… view at source ↗
read the original abstract

Foundation models for biology and physics optimize predictive accuracy, but their internal representations systematically fail to preserve the continuous geometry of the systems they model. We identify the root cause: the Geometric Alignment Tax, an intrinsic cost of forcing continuous manifolds through discrete categorical bottlenecks. Controlled ablations on synthetic dynamical systems demonstrate that replacing cross-entropy with a continuous head on an identical encoder reduces geometric distortion by up to 8.5x, while learned codebooks exhibit a non-monotonic double bind where finer quantization worsens geometry despite improving reconstruction. Under continuous objectives, three architectures differ by 1.3x; under discrete tokenization, they diverge by 3,000x. Evaluating 14 biological foundation models with rate-distortion theory and MINE, we identify three failure regimes: Local-Global Decoupling, Representational Compression, and Geometric Vacuity. A controlled experiment confirms that Evo 2's reverse-complement robustness on real DNA reflects conserved sequence composition, not learned symmetry. No model achieves simultaneously low distortion, high mutual information, and global coherence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that foundation models for biology and physics incur a 'Geometric Alignment Tax' from forcing continuous manifolds through discrete tokenization bottlenecks. Controlled ablations on synthetic dynamical systems show that a continuous head on an identical encoder reduces geometric distortion by up to 8.5x versus cross-entropy, while learned codebooks exhibit a non-monotonic double bind (finer quantization improves reconstruction but worsens geometry). Under continuous objectives architectures differ by only 1.3x, but under discrete tokenization they diverge by 3,000x. Rate-distortion and MINE analysis of 14 biological foundation models identifies three failure regimes (Local-Global Decoupling, Representational Compression, Geometric Vacuity); a controlled experiment shows Evo 2's reverse-complement robustness reflects sequence composition rather than learned symmetry. No model simultaneously achieves low distortion, high mutual information, and global coherence.

Significance. If the central claims hold, the work supplies a concrete, quantitative diagnosis of why current scientific foundation models systematically distort geometry and offers a clear architectural direction (continuous heads) that measurably mitigates the problem. The controlled synthetic ablations and the taxonomy of failure regimes provide falsifiable predictions that could guide future model design for physics and biology.

major comments (2)
  1. [§5] §5 (Real-model evaluation): The attribution of the three failure regimes observed in the 14 biological foundation models to the Geometric Alignment Tax is load-bearing for the paper's central claim, yet rests on extrapolation from synthetic dynamical systems. The manuscript reports rate-distortion and MINE metrics on the real models but does not include any direct test (e.g., curvature histograms, correlation-length statistics, or topological invariants) showing that the chosen synthetic systems preserve the manifold geometry of real data such as protein backbones or genomic sequences. Without this, the causal link between tokenization and the observed regimes remains correlational rather than demonstrated.
  2. [§3.2] §3.2 (Ablation results): The headline quantitative claim of an 8.5x reduction in geometric distortion when replacing cross-entropy with a continuous head is central to the argument. The geometric-distortion metric itself (presumably derived from the rate-distortion or MINE quantities introduced later) is not given an explicit equation or pseudocode in the ablation section, making it impossible to verify that the factor is independent of post-hoc metric choices or hyper-parameter tuning.
minor comments (2)
  1. [Abstract] Abstract and §4: The factor '3,000x' divergence under discrete tokenization is striking but the baseline (which architecture pair, which exact distortion measure) is not restated, reducing readability.
  2. Notation: The paper introduces 'Geometric Alignment Tax' as a named quantity but does not supply a compact mathematical expression for it; a short definition (e.g., Tax = D_geo(discrete) / D_geo(continuous)) would aid precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. The two major points raised are addressable through targeted revisions that strengthen the manuscript without altering its core claims. We respond to each below.

read point-by-point responses
  1. Referee: [§5] §5 (Real-model evaluation): The attribution of the three failure regimes observed in the 14 biological foundation models to the Geometric Alignment Tax is load-bearing for the paper's central claim, yet rests on extrapolation from synthetic dynamical systems. The manuscript reports rate-distortion and MINE metrics on the real models but does not include any direct test (e.g., curvature histograms, correlation-length statistics, or topological invariants) showing that the chosen synthetic systems preserve the manifold geometry of real data such as protein backbones or genomic sequences. Without this, the causal link between tokenization and the observed regimes remains correlational rather than demonstrated.

    Authors: We agree that a direct geometric comparison between the synthetic dynamical systems and real biological manifolds would make the causal attribution more robust. The synthetic systems (Lorenz, Rössler, and linear oscillators) were chosen because they exhibit the same continuous manifold properties—smooth trajectories, local Euclidean structure, and global coherence—that are distorted by tokenization in the real models. In the revised manuscript we will add a new subsection to §5 that computes correlation-length statistics and curvature histograms on both the synthetic trajectories and on representative subsets of the protein backbone and genomic sequence data used for the 14-model evaluation. This will quantify the degree of manifold similarity and thereby convert the current correlational evidence into a stronger, geometry-grounded link. revision: yes

  2. Referee: [§3.2] §3.2 (Ablation results): The headline quantitative claim of an 8.5x reduction in geometric distortion when replacing cross-entropy with a continuous head is central to the argument. The geometric-distortion metric itself (presumably derived from the rate-distortion or MINE quantities introduced later) is not given an explicit equation or pseudocode in the ablation section, making it impossible to verify that the factor is independent of post-hoc metric choices or hyper-parameter tuning.

    Authors: We acknowledge the omission. The geometric-distortion metric used throughout the paper, including in the §3.2 ablations, is the normalized average pairwise distance distortion: D_geo = (1/M) Σ_{i<j} |d_X(x_i,x_j) − d_Z(z_i,z_j)| / d_X(x_i,x_j), where d_X is Euclidean distance in input space and d_Z is Euclidean distance in the continuous latent space (or in the codebook embedding for discrete cases). We will insert this explicit definition together with the corresponding pseudocode immediately before the ablation results in the revised §3.2, ensuring the 8.5× factor can be independently recomputed from the released code and data. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from ablations and external evaluations

full rationale

The paper presents its core findings as outcomes of controlled ablations on synthetic dynamical systems (showing up to 8.5x distortion reduction with continuous heads) and rate-distortion/MINE evaluations on 14 real biological models. These are framed as experimental demonstrations rather than mathematical derivations. No steps reduce by construction to fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations whose content is unverified outside the paper. The claims rest on observable differences across architectures and objectives, with no evidence that the reported geometric tax or failure regimes are tautological with the input metrics or synthetic data generation process.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces the Geometric Alignment Tax as a new explanatory concept and relies on domain assumptions about what rate-distortion theory and MINE measure in model representations. No free parameters or invented entities with independent evidence are explicitly detailed in the abstract.

axioms (1)
  • domain assumption Rate-distortion theory and MINE provide valid measures of geometric distortion and mutual information in the internal representations of foundation models.
    These tools are used to evaluate the 14 biological models and identify failure regimes.
invented entities (1)
  • Geometric Alignment Tax no independent evidence
    purpose: To name and explain the intrinsic cost of forcing continuous manifolds through discrete tokenization bottlenecks.
    Introduced as the root cause identified through ablations; no independent evidence outside the paper's experiments is mentioned.

pith-pipeline@v0.9.0 · 5492 in / 1533 out tokens · 52317 ms · 2026-05-13T16:47:24.366826+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

  1. [1]

    J., Bambrick, J., Bodenstein, S

    Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Willmore, L., Ballard, A. J., Bambrick, J., Bodenstein, S. W., Evans, D. A., Hung, C.-C., O’Neill, M., Reiman, D., Tunyasuvunakool, K., Wu, Z., Žemgulytė, A., Arvaniti, E., Beattie, C., Bertolli, O., Bridgland, A., Cherepanov, A., Congreve, M., Cowen-Rivers, A. I., Co...

  2. [2]

    J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., Nowaczynski, A., Wang, B., Stepniewska-Dziubinska, M

    Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, Q., Gerecke, W., O Donnell, T. J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., Nowaczynski, A., Wang, B., Stepniewska-Dziubinska, M. M., Zhang, S., Ojewole, A., Guney, M. E., Biderman, S., Watkins, A. M., Ra, S., Lorenzo, P. R., Nivon, L., Weitzner, B., Ban, Y.-E. A., Sorger, P. K., Mostaq...

  3. [3]

    and Bengio, Y

    Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv

  4. [4]

    Altschul, S. F. and Erickson, B. W. (1985). Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Molecular biology and evolution , 2(6):526--538

  5. [5]

    R., Ward, T., Bycroft, C., Nicolaisen, L., Arvaniti, E., Pan, J., Thomas, R., Dutordoir, V., Perino, M., De, S., Karollus, A., Gayoso, A., Sargeant, T., Mottram, A., Wong, L

    Avsec, Z ., Latysheva, N., Cheng, J., Novati, G., Taylor, K. R., Ward, T., Bycroft, C., Nicolaisen, L., Arvaniti, E., Pan, J., Thomas, R., Dutordoir, V., Perino, M., De, S., Karollus, A., Gayoso, A., Sargeant, T., Mottram, A., Wong, L. H., Drot \'a r, P., Kosiorek, A., Senior, A., Tanburn, R., Applebaum, T., Basu, S., Hassabis, D., and Kohli, P. (2026). A...

  6. [6]

    I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Hjelm, R

    Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Hjelm, R. D., and Courville, A. C. (2018). Mutual Information Neural Estimation . In International Conference on Machine Learning

  7. [7]

    S., and Song, Y

    Benegas, G., Batra, S. S., and Song, Y. S. (2023). DNA language models are powerful predictors of genome-wide variant effects . Proceedings of the National Academy of Sciences , 120(44):e2311219120

  8. [8]

    G., Ku, J., Naghipourfar, M., Poli, M., Sun, G., Brockman, G., Chang, D., Fanton, A., Gonzalez, G

    Brixi, G., Durrant, M. G., Ku, J., Naghipourfar, M., Poli, M., Sun, G., Brockman, G., Chang, D., Fanton, A., Gonzalez, G. A., King, S. H., Li, D. B., Merchant, A. T., Nguyen, E., Ricci-Tam, C., Romero, D. W., Schmok, J. C., Taghibakhshi, A., Vorontsov, A., Yang, B., Deng, M., Gorton, L., Nguyen, N., Wang, N. K., Pearce, M. T., Simon, E., Adams, E., Amador...

  9. [9]

    Bullock, C. (1716). The Cobler of Preston: A Farce. As it is Acted at the New Theatre in Lincolns-Inn-Fields . Printed for R. Palmer, London

  10. [10]

    L., Raney, B

    Casper, J., Speir, M. L., Raney, B. J., Perez, G., Nassar, L. R., Lee, C. M., Hinrichs, A. S., Gonzalez, J. N., Fischer, C., Diekhans, M., Clawson, H., Benet-Pages, A., Barber, G. P., Vaske, C. J., van Baren, M. J., Wang, K., Rodriguez, Y. J. P., Jenkins-Kiefer, J. A., Chalamala, M., Haussler, D., Kent, W. J., and Haeussler, M. (2025). The UCSC Genome Bro...

  11. [11]

    Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory . John Wiley & Sons, Nashville, TN, 2 edition

  12. [12]

    H., Oteri, F., Dallago, C., Trop, E., de Almeida, B

    Dalla-Torre, H., Gonzalez, L., Mendoza-Revilla, J., Lopez Carranza, N., Grzywaczewski, A. H., Oteri, F., Dallago, C., Trop, E., de Almeida, B. P., Sirelkhatim, H., Richard, G., Skwark, M., Beguir, K., Lopez, M., and Pierrot, T. (2024). Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nature Methods , 22(2):287–297

  13. [13]

    and Gu, A

    Dao, T. and Gu, A. (2024). Transformers are SSM s: Generalized Models and Efficient Algorithms Through Structured State Space Duality . In International Conference on Machine Learning

  14. [14]

    Defoe, D. (1726). The Political History of the Devil, As Well Ancient as Modern: In Two Parts . Printed for T. Warner, London

  15. [15]

    Donsker, M. D. and Varadhan, S. R. S. (1983). Asymptotic evaluation of certain markov process expectations for large time. IV . Communications on Pure and Applied Mathematics , 36(2):183–212

  16. [16]

    Dryden, I. L. and Mardia, K. V. (1998). Statistical analysis of shape . Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, England

  17. [17]

    Franklin, B. (1789). Letter to J ean B aptiste L e R oy, November 13, 1789

  18. [18]

    and Gray, R

    Gersho, A. and Gray, R. M. (1991). Vector Quantization and Signal Compression . The Springer International Series in Engineering and Computer Science. Springer

  19. [19]

    Gray, R. (1990). Quantization noise spectra. IEEE Transactions on Information Theory , 36(6):1220--1244

  20. [20]

    and Dao, T

    Gu, A. and Dao, T. (2024). Mamba: Linear-Time Sequence Modeling with Selective State Spaces . In First Conference on Language Modeling

  21. [21]

    Huang, T., Song, Z., Ying, R., and Jin, W. (2024). Protein-nucleic acid complex modeling with frame averaging transformer. In Advances in Neural Information Processing Systems

  22. [22]

    Ji, Y., Zhou, Z., Liu, H., and Davuluri, R. V. (2021). DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome . Bioinformatics , 37(15):2112--2120

  23. [23]

    and S najder, J

    Juki \'c , J. and S najder, J. (2024). From robustness to improved generalization and calibration in pre-trained language models. Transactions of the Association for Computational Linguistics , 13:264--280

  24. [24]

    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berg...

  25. [25]

    and Singh, S

    Khromov, G. and Singh, S. P. (2024). Some Fundamental Aspects about Lipschitz Continuity of Neural Networks . In International Conference on Learning Representations

  26. [26]

    Kornblith, S., Norouzi, M., Lee, H., and Hinton, G. (2019). Similarity of Neural Network Representations Revisited . In International Conference on Machine Learning

  27. [27]

    Kriegeskorte, N., Mur, M., and Bandettini, P. (2008). Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience

  28. [28]

    Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., and Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science , 379(6637):1123–1130

  29. [29]

    Ma, M. (2025). Reverse-Complement Consistency for DNA Language Models . arXiv preprint arXiv:2509.18529

  30. [30]

    M., and Zemel, Y

    Masarotto, V., Panaretos, V. M., and Zemel, Y. (2018). Procrustes Metrics on Covariance Operators and Optimal Transportation of Gaussian Processes . Sankhya A , 81(1):172–213

  31. [31]

    P., Cesista, F., Zahorodnii, A., Bernstein, J., and Isola, P

    Newhouse, L., Hess, R. P., Cesista, F., Zahorodnii, A., Bernstein, J., and Isola, P. (2025). Training transformers with enforced lipschitz constants. arXiv preprint arXiv:2507.13338

  32. [32]

    D., Poli, M., Faizi, M., Thomas, A

    Nguyen, E. D., Poli, M., Faizi, M., Thomas, A. W., Birch-Sykes, C., Wornow, M., Patel, A., Rabideau, C. M., Massaroli, S., Bengio, Y., Ermon, S., Baccus, S. A., and R \'e , C. (2023). HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution . In Advances in Neural Information Processing Systems

  33. [33]

    Raju, P. C. (2026a). From Syntax to Semantics: Geometric Stability as the Missing Axis of Perturbation Biology . arXiv preprint arXiv:2603.00678

  34. [34]

    Raju, P. C. (2026b). Geometric Stability: The Missing Axis of Representations . arXiv preprint arXiv:2601.09173

  35. [35]

    Raju, P. C. (2026c). Shesha: Self-Consistency Metrics for Representational Stability . doi: 10.5281/zenodo.18227453

  36. [36]

    Rohlf, F. J. and Slice, D. (1990). Extensions of the Procrustes Method for the Optimal Superimposition of Landmarks . Systematic Zoology , 39(1):40

  37. [37]

    Schiff, Y., Kao, C.-H., Gokaslan, A., Dao, T., Gu, A., and Kuleshov, V. (2024). Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling . In International Conference on Machine Learning

  38. [38]

    Sch\" o nemann, P. H. (1966). A Generalized Solution of the Orthogonal Procrustes Problem . Psychometrika , 31(1):1–10

  39. [39]

    Sgarbossa, D., Malbranke, C., and Bitbol, A.-F. (2025). ProtMamba: a homology-aware but alignment-free protein state space model . Bioinformatics , 41(6)

  40. [40]

    Shannon, C. E. (1959). Coding Theorems for a Discrete Source With a Fidelity Criterion . IRE National Convention Record , 7(4):142--163

  41. [41]

    Su, J., Han, C., Zhou, Y., Shan, J., Zhou, X., and Yuan, F. (2024). SaProt: Protein Language Modeling with Structure-aware Vocabulary . In International Conference on Learning Representations

  42. [42]

    E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C

    Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C. H. (2007). UniRef: comprehensive and non-redundant UniProt reference clusters . Bioinformatics , 23(10):1282–1288

  43. [43]

    Watson, J. D. and Crick, F. H. C. (1953). Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid . Nature , 171(4356):737–738

  44. [44]

    and Gilpin, W

    Zhang, Y. and Gilpin, W. (2025). Zero-shot forecasting of chaotic systems. In International Conference on Learning Representations

  45. [45]

    V., and Liu, H

    Zhou, Z., Ji, Y., Li, W., Dutta, P., Davuluri, R. V., and Liu, H. (2024). DNABERT -2: Efficient Foundation Model and Benchmark For Multi-Species Genomes . In International Conference on Learning Representations