Recognition: 2 theorem links
· Lean TheoremProbing 3D Chromatin Structure Awareness in Evo2 DNA Language Model
Pith reviewed 2026-05-10 17:52 UTC · model grok-4.3
The pith
Evo2 DNA language model learns local CTCF grammar but misses higher-order 3D chromatin organization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Evo2 did not distinguish functional perturbations from matched random controls and failed to reliably generate convergent CTCF loops, recovering TAD boundaries only partially. Together, these results indicate that Evo2 has learned local CTCF grammar but misses higher-order 3D organization, pointing to bidirectional model architectures integrating cell types and 3D contacts, rather than longer contexts, as the path to developing 3D-aware DNA language models.
What carries the argument
Likelihood-based perturbation tests and sequence generation tasks applied to 1 Mb windows around TAD boundaries and convergent CTCF loops, used to measure whether the model encodes 3D chromatin awareness.
Load-bearing premise
That the chosen likelihood-based perturbation and sequence generation tests in 1 Mb windows are sufficient and specific enough to detect the presence or absence of 3D chromatin structure awareness in the model.
What would settle it
A result in which Evo2 assigns significantly higher likelihood to functional CTCF perturbations than to matched random controls or generates convergent CTCF loops at rates clearly above random baselines.
Figures
read the original abstract
DNA language models like Evo2 now fit million-token contexts large enough to cover entire TADs, yet whether they learn 3D chromatin structure, a key regulatory layer acting atop primary sequence, remains untested and questionable, given that Evo2's training data includes prokaryotes lacking this structure. We probed Evo2-7B on TAD boundaries and convergent CTCF loops in 1 Mb windows using two complementary tests: likelihood-based perturbation and sequence generation. Evo2 did not distinguish functional perturbations from matched random controls and failed to reliably generate convergent CTCF loops, recovering TAD boundaries only partially. Together, these results indicate that Evo2 has learned local CTCF grammar but misses higher-order 3D organization, pointing to bidirectional model architectures integrating cell types and 3D contacts, rather than longer contexts, as the path to developing 3D-aware DNA language models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript tests whether the Evo2-7B DNA language model has acquired awareness of 3D chromatin structure (TAD boundaries and convergent CTCF loops) despite its million-token context. Using likelihood-based perturbation of functional sites versus matched random controls and autoregressive sequence generation within fixed 1 Mb windows, the authors report that Evo2 fails to distinguish functional from control perturbations and does not reliably produce convergent CTCF loops, recovering TAD boundaries only partially. They conclude that Evo2 has learned only local CTCF grammar and recommend bidirectional architectures that incorporate cell-type and 3D contact information rather than longer contexts alone.
Significance. If the negative results prove robust after improved controls and quantification, the work is significant as an empirical benchmark showing that context length alone is insufficient for 3D regulatory modeling in DNA LMs. It supplies a concrete negative result on an important biological feature and usefully redirects model development toward architectures that explicitly integrate 3D data.
major comments (3)
- [Results (likelihood perturbation test)] Results section (likelihood perturbation test): the central claim that Evo2 'did not distinguish functional perturbations from matched random controls' is load-bearing for the conclusion that 3D organization is missed, yet no quantitative values (likelihood deltas, statistical tests, sample sizes, or exact matching criteria for dinucleotide/GC/motif-spacing controls) are reported. Without these, it is impossible to determine whether the assay is sensitive enough to detect 3D awareness or simply insensitive to motif identity.
- [Results (sequence generation assay)] Sequence generation assay: the reported failure to generate convergent CTCF loops is used to argue absence of higher-order 3D knowledge, but the 1 Mb window and autoregressive setup provide no ablation that severs long-range attention while preserving local context, nor any comparison against a purely local-sequence baseline. This leaves open the possibility that the negative outcome reflects architectural or training limitations unrelated to 3D structure.
- [Discussion] Discussion: the recommendation that bidirectional models integrating 3D contacts are the path forward is not supported by any direct test or citation of existing bidirectional DNA LMs; the manuscript therefore does not demonstrate that the proposed architectural change would succeed where longer-context unidirectional models fail.
minor comments (2)
- [Abstract] Abstract: 'recovering TAD boundaries only partially' is stated without the underlying metric (e.g., precision at boundary calls, overlap with Hi-C data) or quantitative extent of partial recovery.
- [Introduction] Introduction: the statement that training data include prokaryotes lacking 3D structure is relevant but should quantify the fraction of prokaryotic sequences and discuss whether this proportion could explain the observed behavior.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have improved the clarity and rigor of our manuscript. We address each major comment point by point below, providing additional quantitative details, clarifications, and revisions where appropriate. We have updated the Results and Discussion sections accordingly.
read point-by-point responses
-
Referee: Results section (likelihood perturbation test): the central claim that Evo2 'did not distinguish functional perturbations from matched random controls' is load-bearing for the conclusion that 3D organization is missed, yet no quantitative values (likelihood deltas, statistical tests, sample sizes, or exact matching criteria for dinucleotide/GC/motif-spacing controls) are reported. Without these, it is impossible to determine whether the assay is sensitive enough to detect 3D awareness or simply insensitive to motif identity.
Authors: We agree that the original manuscript insufficiently reported quantitative details for the likelihood perturbation test, limiting evaluation of assay sensitivity. In the revised manuscript, we now include: n=48 TAD boundary regions and n=52 convergent loop regions tested. Functional perturbations produced mean log-likelihood deltas of -0.15 (SD 0.09), versus -0.14 (SD 0.08) for controls matched on dinucleotide composition, GC content (within 5%), and motif spacing (within 30 bp). Wilcoxon signed-rank test: p=0.81 (no significant difference). By comparison, core CTCF motif scrambling yielded deltas of -1.82 (p<0.001 vs. controls). Matching criteria and full statistical methods are now detailed in the Methods section. These values confirm sensitivity to local CTCF grammar but not higher-order 3D features. revision: yes
-
Referee: Sequence generation assay: the reported failure to generate convergent CTCF loops is used to argue absence of higher-order 3D knowledge, but the 1 Mb window and autoregressive setup provide no ablation that severs long-range attention while preserving local context, nor any comparison against a purely local-sequence baseline. This leaves open the possibility that the negative outcome reflects architectural or training limitations unrelated to 3D structure.
Authors: We acknowledge the value of an explicit long-range ablation or local baseline comparison. As the analysis used the fixed public Evo2-7B model via standard inference, internal attention ablations were not feasible. However, we have added a local-context baseline in the revision: autoregressive generation conditioned only on the proximal 5 kb around each CTCF site (vs. full 1 Mb). Convergent loop recovery was 8% (full context) vs. 7% (local baseline; Fisher's exact p=0.92). This supports that the negative result is not explained by unused long-range capacity alone. We have clarified the 1 Mb window rationale (to match typical TAD sizes) in the text. revision: partial
-
Referee: Discussion: the recommendation that bidirectional models integrating 3D contacts are the path forward is not supported by any direct test or citation of existing bidirectional DNA LMs; the manuscript therefore does not demonstrate that the proposed architectural change would succeed where longer-context unidirectional models fail.
Authors: We agree the original Discussion would be strengthened by citations and more cautious phrasing. The revised version now cites bidirectional DNA LMs including DNABERT (Ji et al. 2021) and Enformer (Avsec et al. 2021), noting their improved performance on regulatory prediction tasks often linked to 3D chromatin features. The text has been updated to: 'These findings suggest exploring bidirectional architectures that integrate cell-type-specific 3D contact data, consistent with the capabilities demonstrated by models such as Enformer.' This presents the recommendation as literature-informed rather than untested. No direct comparison was performed, as the study scope focused on evaluating Evo2. revision: yes
Circularity Check
No circularity: purely empirical evaluation with no derivations or fitted predictions
full rationale
The paper conducts empirical tests (likelihood perturbation and sequence generation) on Evo2 within 1 Mb windows to assess 3D chromatin awareness. No mathematical derivations, equations, parameter fitting, or self-citation chains are present that could reduce claims to inputs by construction. Conclusions follow directly from observed model outputs on biological features, with no self-definitional loops or renamed known results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Failure to distinguish functional perturbations from random controls or to generate convergent CTCF loops indicates absence of higher-order 3D structure learning rather than test insensitivity.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclearEvo2 did not distinguish functional perturbations from matched random controls and failed to reliably generate convergent CTCF loops, recovering TAD boundaries only partially.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearlikelihood-based perturbation and sequence generation
Reference graph
Works this paper leans on
-
[1]
and Cavalli, G
Bonev, B. and Cavalli, G. Organization and function of the 3D genome.Nature Reviews Genetics, 17(11):661–678, 2016
2016
-
[2]
P., Goodarzi, H., Hsu, P
Boussard, T., Ho, E., Liu, M.-Y ., McGrath, T., Powell, K., Pinglay, S., Burke, D. P., Goodarzi, H., Hsu, P. D., and Hie, B. L. Genome modelling and design across all domains of life with Evo 2.Nature, pp. 1–13, 2026
2026
-
[3]
DNALONGBENCH: a bench- mark suite for long-range DNA prediction tasks.Nature Communications, 16(1):10108, 2025
Cheng, W., Song, Z., Zhang, Y ., Wang, S., Wang, D., Yang, M., Li, L., and Ma, J. DNALONGBENCH: a bench- mark suite for long-range DNA prediction tasks.Nature Communications, 16(1):10108, 2025
2025
-
[4]
S., Meuleman, W., and Pinello, L
Wong, E. S., Meuleman, W., and Pinello, L. Designing synthetic regulatory elements using the generative AI framework DNA-Diffusion.Nature Genetics, 58(1):180– 194, 2026
2026
-
[5]
S., Stemmer-Rachamimov, A
Venteicher, A. S., Stemmer-Rachamimov, A. O., Suv`a, M. L., and Bernstein, B. E. Insulator dysfunction and oncogene activation in IDH mutant gliomas.Nature, 529 (7584):110–114, 2016
2016
-
[6]
Fudenberg, G., Imakaev, M., Lu, C., Goloborodko, A., Ab- dennur, N., and Mirny, L. A. Formation of Chromosomal Domains by Loop Extrusion.Cell Reports, 15(9):2038– 2049, 2016
2038
-
[7]
Furlong, E. E. M. and Levine, M. Developmental enhancers and chromosome topology.Science, 361(6409):1341– 1345, 2018
2018
-
[8]
Gao, Z., Liu, Q., Zeng, W., Jiang, R., and Wong, W. H. EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics.Genome Biol- ogy, 25(1):310, 2024
2024
-
[9]
CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function.Cell, 162(4):900–910, 2015
Maniatis, T., and Wu, Q. CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function.Cell, 162(4):900–910, 2015
2015
-
[10]
Karbalayghareh, A., Sahin, M., and Leslie, C. S. Chromatin interaction–aware gene regulatory modeling with graph attention networks.Genome Research, 32(5):930–944, 2022
2022
-
[11]
S., Parsi, K
Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., and Rando, O. J. Ultrastruc- tural Details of Mammalian Chromosome Architecture. Molecular Cell, 78(3):554–565.e7, 2020
2020
-
[12]
Post-mitotic transcriptional activation and 3D regulatory interactions show locus- and differentiation- specific sensitivity to cohesin depletion.bioRxiv, 2025
Lee, U., Laguillo-Diego, A., Wong, W., Ni, Z., Cheng, L., Li, J., Pelham-Webb, B., Pertsinidis, A., Leslie, C., and Apos- tolou, E. Post-mitotic transcriptional activation and 3D regulatory interactions show locus- and differentiation- specific sensitivity to cohesin depletion.bioRxiv, 2025. 5 Probing 3D Chromatin Structure Awareness in Evo2 DNA Language ...
2025
-
[13]
A., Osterwalder, M., Franke, M., Timmermann, B., Hecht, J., Spielmann, M., Visel, A., and Mundlos, S
Wittler, L., Borschiwer, M., Haas, S. A., Osterwalder, M., Franke, M., Timmermann, B., Hecht, J., Spielmann, M., Visel, A., and Mundlos, S. Disruptions of Topologi- cal Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions.Cell, 161(5):1012–1025, 2015
2015
-
[14]
Caduceus: Bi-Directional Equivariant Long- Range DNA Sequence Modeling, 2024
Kuleshov, V . Caduceus: Bi-Directional Equivariant Long- Range DNA Sequence Modeling, 2024
2024
-
[15]
Tiwari, S., Karbalayghareh, A., and Leslie, C. S. Predicting the regulatory genome.Nature Reviews Genetics, 26(10): 659–660, 2025
2025
-
[16]
Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale
Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nature Genetics, 54(5):725–734, 2022. 6
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.