pith. machine review for the scientific record. sign in

arxiv: 2605.07407 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: no theorem link

Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:17 UTC · model grok-4.3

classification 💻 cs.LG
keywords health foundation modelssymbolic representationscross-modal alignmentwearable sensorsPPGaccelerometerembedding interpretabilitypost-training alignment
0
0 comments X

The pith

Health foundation model embeddings contain an interpretable symbolic organization shared across sensor modalities that supports cross-domain transfer without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a post-training method to break down frozen embeddings from health foundation models into directions called symbols. These symbols link selectively to health conditions and physiological attributes. The symbols show partial overlap between different modalities such as photoplethysmography and accelerometers, even when the models were trained independently. Aligning the embedding spaces through these symbols allows knowledge transfer that keeps over 95 percent of original performance and requires only limited paired examples. This indicates the models encode a shared low-dimensional structure useful for interpretation and reuse across domains.

Core claim

Health foundation models pretrained on large unlabeled wearable datasets produce embeddings that decompose into interpretable directions termed symbols. These symbols associate with specific health conditions and attributes. The associations are partially shared across modalities and architectures. Aligning spaces via the symbols produces cross-modal transfer that retains more than 95 percent of in-domain performance, remains nearly symmetric, and saturates with small amounts of paired data, recovering a shared physiological subspace.

What carries the argument

Post-training decomposition of frozen embeddings into interpretable directions (symbols) that are then used to align embedding spaces across modalities without any retraining.

If this is right

  • Extracted symbols associate selectively with health conditions and physiological attributes.
  • Symbol-attribute associations are partially shared across PPG and accelerometer modalities.
  • Cross-modal transfer via symbols retains more than 95 percent of in-domain performance and is nearly symmetric.
  • The alignment process saturates with limited paired data, indicating recovery of a low-dimensional shared subspace.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support modular reuse of health models across new sensor types without full retraining.
  • If symbols prove stable, they might serve as building blocks for combining multiple foundation models in clinical pipelines.
  • Testing symbol consistency on data from varied demographics would clarify whether the shared subspace generalizes beyond the study cohort.
  • The framework might reveal whether similar symbolic structures emerge in non-wearable health models such as those trained on imaging or lab data.

Load-bearing premise

The extracted symbols capture genuine physiological information rather than spurious correlations or artifacts introduced by the decomposition process.

What would settle it

If the symbols show no selective correlation with independent physiological measurements such as heart rate variability or step counts on a new held-out cohort from a different population, the claim of a shared physiological subspace would be falsified.

Figures

Figures reproduced from arXiv: 2605.07407 by Advait Koparkar, Anshuman Mishra, Gajendra Katuwal, Salar Abbaspourazad, Sarvesh Kirthivasan.

Figure 1
Figure 1. Figure 1: Overview of the symbolic framework. Frozen embeddings are re-expressed as symbol activations via linear projections (ICA, PCA, or NMF), then aligned across models using bijective matching (Hungarian algorithm on |C|) or CCA. Aligned symbols are evaluated for selective meaning (W1 distance against 23 targets) and cross-modal transfer (classifier trained on source, tested on aligned target). All operators ar… view at source ↗
Figure 2
Figure 2. Figure 2: Cross-modal symbol alignment analysis [PPG Efficient￾Net vs Accel ViT]. (a) Bijective matching across symbol extractors: cross-modal correspondence is already present in raw embeddings (black dotted). Symbol extraction reshapes how it is distributed — ICA concentrates it into a few top-ranked symbols, while NMF spreads it more broadly across ranks. (b) Zoomed view of top-20 symbols showing ICA’s stronger p… view at source ↗
Figure 3
Figure 3. Figure 3: Semantic grounding and meaning selectivity (ICA, PPG ViT vs Accel ViT). (a) W1 heatmaps measuring distributional shift of each symbol between condition-positive and condition-negative subjects. Each column is a symbol (extracted direction in embedding space) and each row is a health target. Cell values are W1 distances between the symbol activation distributions of positive and negative subjects — higher v… view at source ↗
Figure 4
Figure 4. Figure 4: Cross-modal symbolic transfer. (a) Classifiers trained in one symbol space retain >98% of in-domain AUC after cross-modal alignment; blood type (negative control) remains near chance. Best transfer is achieved with the CCA family of alignments. (b) Transfer is nearly symmetric across domain directions (r = 0.995), suggesting an approximate isomorphism between symbol spaces. (c) Transfer saturates with limi… view at source ↗
Figure 5
Figure 5. Figure 5: Cross-modal meaning consistency across model pairs and symbol extraction methods. Lines show median per-symbol W1 cosine similarity; shaded bands show interquartile range (Q25–Q75). PCA maintains 0.90–0.91 across all pairs (architecture-invariant), while ICA drops from 0.91 to 0.87 when both modality and architecture differ—higher-order statistics exploited by ICA are sensitive to architectural nonlinearit… view at source ↗
Figure 6
Figure 6. Figure 6: Selectivity Index computed on |Cohen’s d| effect size (PPG ViT vs Accel ViT, ICA+bijective). Cohen’s d has E[d] = 0 under the null regardless of group size, eliminating the minority-group inflation floor present in W1. Age 60+ and BMI underweight remain dominant, confirming genuine physiological effect sizes. 7.9. Transfer AUC by Extraction and Alignment Method Tables 4–6 report transfer AUC and retention … view at source ↗
read the original abstract

Health foundation models (FMs) learn useful representations from wearable sensors, but interpreting what they encode and transferring that knowledge across modalities after training remains difficult. We present a post-training framework that decomposes frozen embeddings into interpretable directions, referred to as symbols, and use these symbols to align the embedding spaces without retraining. We evaluate the framework on three FMs for photoplethysmography (PPG) and accelerometer data, independently pretrained on ~20M minutes of unlabeled data from ~172K participants, and analyzed on a held-out cohort of 30K subjects. We find that extracted symbols associate selectively with health conditions and physiological attributes, and these associations are partially shared across modalities and architectures. Cross-modal transfer via symbols retains more than 95% of in-domain performance, is nearly symmetric across domain directions, and saturates with limited paired data, together indicating that alignment recovers a shared low-dimensional subspace rich in physiological information. Overall, these results suggest that health FM embeddings contain an interpretable symbolic organization that is shared across modalities and supports cross-domain transfer without joint training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a post-training framework that decomposes frozen embeddings from health foundation models (pretrained on ~20M minutes of unlabeled PPG and accelerometer data from ~172K participants) into interpretable directions called symbols. These symbols are then used to align embedding spaces across modalities without retraining. On a held-out cohort of 30K subjects, the authors report that extracted symbols associate selectively with health conditions and physiological attributes, with partial sharing across modalities and architectures; cross-modal transfer via symbols retains >95% of in-domain performance, is nearly symmetric, and saturates with limited paired data, supporting the claim of an emergent shared low-dimensional symbolic organization rich in physiological information.

Significance. If the central claims hold after addressing validation concerns, the work would show that large-scale health FMs encode an interpretable, partially shared physiological subspace extractable post-hoc and alignable with minimal paired data. This could meaningfully advance interpretability and practical cross-sensor transfer in wearable health modeling without requiring joint retraining on massive datasets. The scale of pretraining and the quantitative saturation/transfer results are strengths that, if robust, would distinguish the contribution from purely post-hoc fitting exercises.

major comments (2)
  1. [Abstract / Evaluation] Abstract and evaluation protocol: All reported quantitative results—selective health-condition associations, partial cross-modal sharing, >95% transfer retention, symmetry, and saturation—are measured exclusively on the same 30K held-out cohort drawn from the original ~172K-participant pool. No external validation cohort, demographic shift, sensor variation, or distribution-shift test is described. This is load-bearing for the claim that symbols encode genuine physiological information rather than cohort-specific correlations or pretraining artifacts, as the symbol extraction, association analysis, and alignment all operate within the same data regime.
  2. [Abstract / Methods] Abstract and methods on symbol extraction: The procedure for decomposing embeddings into symbols, the statistical controls used to establish selective associations, and any error bars or significance testing are not quantified in the reported results. Without these details it is difficult to rule out that the observed associations and the recovered low-dimensional subspace are post-hoc fitting artifacts rather than robust physiological directions, directly affecting the interpretability and generalization claims.
minor comments (2)
  1. [Methods] The notation distinguishing 'symbols' from the original embedding dimensions and from the alignment transformation should be introduced with explicit equations early in the methods to improve readability.
  2. [Results] Figure captions and axis labels for the saturation and symmetry plots would benefit from explicit mention of the number of paired samples used at each point and the exact performance metric being plotted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our evaluation protocol and methodological transparency. We address each major point below, clarifying the held-out nature of our analyses while acknowledging limitations, and commit to specific revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and evaluation protocol: All reported quantitative results—selective health-condition associations, partial cross-modal sharing, >95% transfer retention, symmetry, and saturation—are measured exclusively on the same 30K held-out cohort drawn from the original ~172K-participant pool. No external validation cohort, demographic shift, sensor variation, or distribution-shift test is described. This is load-bearing for the claim that symbols encode genuine physiological information rather than cohort-specific correlations or pretraining artifacts, as the symbol extraction, association analysis, and alignment all operate within the same data regime.

    Authors: The 30K cohort is a strictly held-out partition from the original participant pool and was never used in foundation-model pretraining, allowing symbol extraction, association testing, and cross-modal alignment to be evaluated on data unseen during model training. This design directly tests whether interpretable symbolic structure emerges in embeddings of a large independent sample drawn from the same population. We agree that fully external cohorts with demographic or sensor shifts would provide stronger evidence against cohort-specific artifacts. In the revised manuscript we will add an explicit limitations paragraph discussing this point and outlining future validation plans, while retaining the current quantitative results as evidence within the studied regime. revision: partial

  2. Referee: [Abstract / Methods] Abstract and methods on symbol extraction: The procedure for decomposing embeddings into symbols, the statistical controls used to establish selective associations, and any error bars or significance testing are not quantified in the reported results. Without these details it is difficult to rule out that the observed associations and the recovered low-dimensional subspace are post-hoc fitting artifacts rather than robust physiological directions, directly affecting the interpretability and generalization claims.

    Authors: We will expand the Methods section with a complete, self-contained description of the decomposition procedure (including the exact optimization objective and hyper-parameters), the statistical controls (permutation testing and FDR correction for selective associations), and the reporting of error bars (bootstrap or analytic) together with p-values for all key quantitative claims. Revised figures and tables will include these quantities. These additions will allow readers to directly evaluate the robustness of the extracted symbols and the low-dimensional subspace. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical framework

full rationale

The paper presents a post-training empirical procedure applied to frozen pretrained embeddings: symbol decomposition, limited-pair alignment, and evaluation of selective associations plus cross-modal transfer performance on held-out subjects. No derivation chain, first-principles result, or prediction is claimed that reduces by construction to fitted inputs, self-citations, or renamed quantities. All quantitative results (associations, saturation, symmetry, >95% retention) are measured on independent held-out data, providing external checks relative to the pretrained models. The framework therefore remains self-contained without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach appears to rest on standard embedding geometry and post-hoc linear decomposition without additional postulated entities.

pith-pipeline@v0.9.0 · 5514 in / 1155 out tokens · 39900 ms · 2026-05-11T02:17:15.833341+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 4 internal anchors

  1. [1]

    2011 , publisher=

    Thinking, Fast and Slow , author=. 2011 , publisher=

  2. [2]

    2021 , note=

    Apple Heart & Movement Study , author=. 2021 , note=

  3. [3]

    arXiv preprint arXiv:2012.05876 , year=

    Neurosymbolic Artificial Intelligence: The State of the Art , author=. arXiv preprint arXiv:2012.05876 , year=

  4. [4]

    Communications of the ACM , volume=

    Computer Science as Empirical Inquiry: Symbols and Search , author=. Communications of the ACM , volume=. 1976 , publisher=

  5. [5]

    1983 , publisher=

    Mental Models , author=. 1983 , publisher=

  6. [6]

    Transformer Circuits Thread , year=

    A Mathematical Framework for Transformer Circuits , author=. Transformer Circuits Thread , year=

  7. [7]

    2024 , journal=

    Linear Representations: The Linear Representation Hypothesis , author=. 2024 , journal=

  8. [8]

    International Conference on Learning Representations (ICLR) , year=

    Convergent Learning: Do Different Neural Networks Learn the Same Representations? , author=. International Conference on Learning Representations (ICLR) , year=

  9. [9]

    Samy Jelassi, St ´ephane d’Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, and Franc ¸ois Charton

    A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations , author=. arXiv preprint arXiv:2302.03025 , year=

  10. [10]

    The Platonic Representation Hypothesis

    The Platonic Representation Hypothesis , author=. arXiv preprint arXiv:2405.07987 , year=

  11. [11]

    2023 , url=

    Progress Measures for Grokking via Mechanistic Interpretability , author=. 2023 , url=

  12. [12]

    2002 , publisher=

    Principal Component Analysis , author=. 2002 , publisher=

  13. [13]

    Psychometrika , volume=

    The Varimax Criterion for Analytic Rotation in Factor Analysis , author=. Psychometrika , volume=

  14. [14]

    Neural Networks , volume=

    Independent Component Analysis: Algorithms and Applications , author=. Neural Networks , volume=

  15. [15]

    Nature , volume=

    Learning the Parts of Objects by Non-Negative Matrix Factorization , author=. Nature , volume=

  16. [16]

    2011 , journal=

    Sparse Autoencoder , author=. 2011 , journal=

  17. [17]

    Transformer Circuits Thread , year=

    Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=

  18. [18]

    International Conference on Learning Representations (ICLR) , year=

    Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. International Conference on Learning Representations (ICLR) , year=

  19. [19]

    Biometrika , volume=

    Relations Between Two Sets of Variates , author=. Biometrika , volume=

  20. [20]

    Biostatistics , volume=

    A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis , author=. Biostatistics , volume=. 2009 , doi=

  21. [21]

    International Conference on Learning Representations (ICLR) , year=

    Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective , author=. International Conference on Learning Representations (ICLR) , year=

  22. [22]

    Parallel Distributed Processing , year=

    Distributed representations , author=. Parallel Distributed Processing , year=

  23. [23]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    Representation learning: A review and new perspectives , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  24. [24]

    Distill , year=

    Zoom In: An Introduction to Circuits , author=. Distill , year=

  25. [25]

    Transformer Circuits , year=

    Toy Models of Superposition , author=. Transformer Circuits , year=

  26. [26]

    and McDougall, Callum and MacDiarmid, Monte and Freeman, C

    Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameisen, Emmanuel and Jones, Andy and Cunningham, Hoagy and Turner, Nicholas L. and McDougall, Callum and MacDiarmid, Monte and Freeman, C. Daniel and Sumers, Theodore R. and Rees, Edward and Batson, Joshua and...

  27. [27]

    Scaling and evaluating sparse autoencoders

    Scaling and Evaluating Sparse Autoencoders , author=. arXiv preprint arXiv:2406.04093 , year=

  28. [28]

    arXiv preprint arXiv:2404.16014 , year=

    Improving Dictionary Learning with Gated Sparse Autoencoders , author=. arXiv preprint arXiv:2404.16014 , year=

  29. [29]

    Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

    Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models , author=. arXiv preprint arXiv:2403.19647 , year=

  30. [30]

    International Conference on Learning Representations (ICLR) , year=

    Not All Language Model Features Are Linear , author=. International Conference on Learning Representations (ICLR) , year=

  31. [31]

    and Kheirkhah, Tara R

    Gurnee, Wes and Horsley, Theo and Guo, Zifan C. and Kheirkhah, Tara R. and Sun, Qinyi and Hathaway, Will and Nanda, Neel and Bertsimas, Dimitris , journal=. Universal Neurons in. 2024 , url=

  32. [32]

    The Linear Representation Hypothesis and the Geometry of Large Language Models

    The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. arXiv preprint arXiv:2311.03658 , year=

  33. [33]

    and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J

    Zou, Andy and Phan, Long and Chen, Sarah and Campbell, James and Guo, Phillip and Ren, Richard and Pan, Alexander and Yin, Xuwang and Mazeika, Mantas and Dombrowski, Ann-Kathrin and Goel, Shashwat and Li, Nathaniel and Byun, Michael J. and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J. ...

  34. [34]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    Revisiting Model Stitching to Compare Neural Representations , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  35. [35]

    International Conference on Machine Learning (ICML) , year=

    Similarity of Neural Network Representations Revisited , author=. International Conference on Machine Learning (ICML) , year=

  36. [36]

    International Conference on Machine Learning (ICML) , year=

    Learning Transferable Visual Models From Natural Language Supervision , author=. International Conference on Machine Learning (ICML) , year=

  37. [37]

    Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (

    Kim, Been and Wattenberg, Martin and Gilmer, Justin and Cai, Carrie and Wexler, James and Viegas, Fernanda and Sayres, Rory , booktitle=. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (. 2018 , url=

  38. [38]

    International Conference on Machine Learning (ICML) , year=

    Concept Bottleneck Models , author=. International Conference on Machine Learning (ICML) , year=

  39. [39]

    International Conference on Learning Representations (ICLR) , year=

    Label-Free Concept Bottleneck Models , author=. International Conference on Learning Representations (ICLR) , year=

  40. [40]

    and Spathis, Dimitris and Kawsar, Fahim and Malekzadeh, Mohammad , booktitle=

    Pillai, Arvind S. and Spathis, Dimitris and Kawsar, Fahim and Malekzadeh, Mohammad , booktitle=. 2025 , url=

  41. [41]

    arXiv preprint arXiv:2507.01045 , year=

    Sensing Cardiac Health Across Scenarios and Devices: A Multi-Modal Foundation Model Pretrained on 1.7 Million Individuals , author=. arXiv preprint arXiv:2507.01045 , year=

  42. [42]

    NeurIPS ML for Mobile Health Workshop , year=

    Self-supervised Transfer Learning of Physiological Representations from Free-living Wearable Data , author=. NeurIPS ML for Mobile Health Workshop , year=

  43. [43]

    2012 , publisher=

    He, Meian and Wolpin, Brian and Rexrode, Kathryn and Manson, JoAnn E and Rimm, Eric and Hu, Frank B and Qi, Lu , journal=. 2012 , publisher=

  44. [44]

    2006 , publisher=

    Jenkins, Peter V and O'Donnell, James S , journal=. 2006 , publisher=

  45. [45]

    Dentali, Francesco and Sironi, Andrea P and Ageno, Walter and Turato, Sergio and Bonfanti, Corrado and Frattini, Francesca and Crestani, Sergio and Franchini, Massimo , journal=. Non-. 2012 , publisher=

  46. [46]

    Beyond immunohaematology: the role of the

    Liumbruno, Giancarlo Maria and Franchini, Massimo , journal=. Beyond immunohaematology: the role of the. 2013 , publisher=

  47. [47]

    Journal of Machine Learning Research , volume=

    Non-negative matrix factorization with sparseness constraints , author=. Journal of Machine Learning Research , volume=

  48. [48]

    Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

    Large-Scale Training of Foundation Models for Wearable Biosignals , author=. arXiv preprint arXiv:2312.05409 , year=

  49. [49]

    Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

    Wearable Accelerometer Foundation Models for Health via Knowledge Distillation , author=. arXiv preprint arXiv:2412.11276 , year=

  50. [50]

    International Conference on Learning Representations (ICLR) , year=

    Relative Representations Enable Zero-Shot Latent Space Communication , author=. International Conference on Learning Representations (ICLR) , year=

  51. [51]

    arXiv preprint arXiv:2310.13018 , year=

    Getting Aligned on Representational Alignment , author=. arXiv preprint arXiv:2310.13018 , year=

  52. [52]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  53. [53]

    International Conference on Machine Learning (ICML) , year=

    Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , author=. International Conference on Machine Learning (ICML) , year=

  54. [54]

    and Monti, Ricardo Pio and Hyv

    Khemakhem, Ilyes and Kingma, Diederik P. and Monti, Ricardo Pio and Hyv. Variational Autoencoders and Nonlinear. International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

  55. [55]

    Narayanswamy, X

    Scaling Wearable Foundation Models , author=. arXiv preprint arXiv:2410.13638 , year=

  56. [56]

    Discovering Universal Geometry in Embeddings with

    Yamagiwa, Hiroaki and Oyama, Momose and Shimodaira, Hidetoshi , booktitle=. Discovering Universal Geometry in Embeddings with. 2023 , url=

  57. [57]

    2024 , url=

    Tigges, Curt and Hanna, Michael and Yu, Qinan and Biderman, Stella , journal=. 2024 , url=

  58. [58]

    The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

    The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets , author=. arXiv preprint arXiv:2310.06824 , year=

  59. [59]

    Naval Research Logistics Quarterly , volume=

    The Hungarian Method for the Assignment Problem , author=. Naval Research Logistics Quarterly , volume=

  60. [60]

    Science , volume=

    A Global Geometric Framework for Nonlinear Dimensionality Reduction , author=. Science , volume=. 2000 , publisher=

  61. [61]

    International Conference on Learning Representations (ICLR) , year=

    Word Translation Without Parallel Data , author=. International Conference on Learning Representations (ICLR) , year=

  62. [62]

    arXiv preprint arXiv:2012.13255 , year=

    Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , author=. arXiv preprint arXiv:2012.13255 , year=

  63. [63]

    2009 , publisher=

    Optimal Transport: Old and New , author=. 2009 , publisher=