pith. machine review for the scientific record. sign in

arxiv: 2605.06510 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI

Recognition: unknown

Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords tabular foundation modelslayerwise dynamicsinference redundancysingle-layer modelsin-context learningtransformer modelsparameter efficiencymodel compression
0
0 comments X

The pith

Tabular foundation models exhibit depthwise redundancy, allowing a looped single-layer design to match full performance with 20% of the parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a large-scale analysis of layer-by-layer prediction formation in six transformer-based tabular foundation models. It identifies distinct inference stages whose computations overlap substantially, pointing to iterative refinement instead of unique contributions from each added layer. This pattern differs from the dynamics seen in language models. The authors use the finding to build a proof-of-concept model that repeatedly applies one layer. The resulting architecture retains comparable accuracy on tabular tasks while using only one-fifth the original parameter count.

Core claim

Analysis of six state-of-the-art tabular in-context learning models shows that predictions emerge through overlapping computations across depth, revealing substantial redundancy and iterative refinement rather than strictly sequential stage-wise progress. Latent-space dynamics also differ from those of language models. These observations support a looped single-layer architecture that achieves comparable benchmark performance with 20% of the original parameters.

What carries the argument

Depthwise redundancy manifesting as overlapping computations across inference stages, which supports iteration of a single layer to emulate multi-layer behavior.

If this is right

  • Tabular foundation models can be compressed to a fraction of their parameter count while preserving task performance.
  • Inference proceeds via iterative refinement with substantial overlap between layers rather than distinct sequential computations.
  • A single layer, when looped, suffices to reproduce the essential prediction dynamics of deeper models on tabular data.
  • Latent-space evolution in these models differs systematically from the patterns observed in language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same redundancy analysis could be repeated on time-series or graph foundation models to identify similarly compressible architectures.
  • Training objectives might be modified to explicitly encourage layer overlap, making compression easier at design time.
  • Inference compute could be made variable by choosing the number of loops based on input difficulty rather than fixing model depth.

Load-bearing premise

The redundancy patterns seen in the six examined models will appear in other tabular foundation models, and the looped single-layer version can be trained and tested without creating new failure modes absent from the original depth analysis.

What would settle it

Evaluating the looped single-layer model on the same tabular benchmarks and finding accuracy drops larger than 5% relative to the original models, or finding additional tabular foundation models whose layer activations show non-overlapping unique contributions.

Figures

Figures reproduced from arXiv: 2605.06510 by Amir Rezaei Balef, Katharina Eggensperger, Mykhailo Koshil.

Figure 1
Figure 1. Figure 1: Individually trained decoders (–, - -) exhibit good per￾formance early, showing that representations are descriptive but not aligned with the original decoder. For the original decoder, the sudden increase in ROC-AUC (–) and balanced accuracy (- -) at different layers suggests inference stages. the identification of limitations in existing architectures and the detection of failures during model training a… view at source ↗
Figure 3
Figure 3. Figure 3: ⃝1 Embedding similarity over different layers of the respective models (average over all datasets), upper triangular – linear CKA, lower triangular – cosine similarities. Takeaway 1. TFMs often form blocks in which the em￾beddings remain similar. 2 Separation gap. Unlike LLMs, TFMs have a fixed task, for example, classification. This allows us to track progress towards the goal of separating classes. To do… view at source ↗
Figure 2
Figure 2. Figure 2: Experiments analyzing the inference process of tabular ICL models. 1 Embedding similarity. First, we study the similarity of the representation space across layers. Following prior work (Sun et al., 2025; Lad et al., 2025), we examine both the averaged absolute cosine similarity and linear centered kernel alignment (CKA) (Kornblith et al., 2019) between the output embeddings of each layer (see Appendix A.1… view at source ↗
Figure 4
Figure 4. Figure 4: ⃝2 Separation gap (mean difference between inner￾class and intra-class distances) across layers of the embedding network. Bold lines indicate the average across tasks, and thin lines represent results for individual datasets. features, we also compute the gap value for the (grouped) features and label. See Appendix B.2 for a formal definition and more details. A large separation gap indicates highly discri… view at source ↗
Figure 7
Figure 7. Figure 7: ⃝5 Layer ablation effect on the performance of model. Takeaway 5. Early layers contribute the most, while later layers perform iterative refinement of the representation. 6 Self-repair. Here, we study whether TFMs exhibit self-repair mechanisms and, thus, whether layers perform similar or overlapping computations. As observed, TFMs are generally robust to ablating layers. However, it is unclear whether thi… view at source ↗
Figure 8
Figure 8. Figure 8: ⃝6 Self-repair analysis under layer skipping. The Tabular Logit Lens measures model performance at each layer (intermediate performance). The solid black line shows interme￾diate performance without intervention. Colored lines, from blue (early) to orange (late), show intermediate performance after layer ablations, with cross markers indicating skipped layers. Dashed lines connect the first layer after a s… view at source ↗
Figure 9
Figure 9. Figure 9: Performance comparison between nanoTabPFN6l, nan￾oTabPFN1l, and nanoTabPFNlooped. Repeating a single trans￾former block recovers performance comparable to the full-depth model. 6. Conclusion In this work, we investigate open questions regarding the mechanisms inside TFMs and how they solve predictive tasks. We found that although the layer dynamics in the TFMs we studied differ from those in LLMs, iterativ… view at source ↗
read the original abstract

Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the-art tabular in-context learning models. We explore how predictions emerge across depth, identify distinct stages of inference and reveal latent-space dynamics that differ from those of language models. Our findings indicate substantial depthwise redundancy across multiple models, suggesting iterative refinement with overlapping computations during inference stages. Guided by these insights, we design a proof-of-concept, looped single-layer model that uses only 20% of the original model's parameters while achieving comparable performance. The code is available at https://github.com/amirbalef/is_one_layer_enough.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the first large-scale mechanistic study of layerwise inference dynamics across six state-of-the-art transformer-based tabular foundation models. It identifies distinct stages of prediction emergence, latent-space dynamics that differ from language models, and substantial depthwise redundancy interpreted as iterative refinement with overlapping computations. Guided by these observations, the authors introduce a proof-of-concept looped single-layer architecture that uses only 20% of the original parameters while achieving comparable performance on tabular tasks; code is provided for reproducibility.

Significance. If the redundancy findings and the looped-model result hold under further scrutiny, the work offers actionable insights for designing more parameter-efficient tabular foundation models and advances mechanistic understanding of transformers outside the language domain. The empirical scale (six models) and open code are strengths that support follow-up research.

major comments (2)
  1. [§5] §5 (Looped single-layer model): The central claim that observed depthwise redundancy directly enables a looped single-layer model to reach comparable performance at 20% parameter count is not yet load-bearing. The depthwise analysis in §4 shows layer similarity and stage-wise refinement in the original models, but provides no ablations demonstrating that selecting and repeating one layer (or equivalent) preserves the necessary refinement dynamics when the reduced model is trained from scratch or fine-tuned. Without explicit comparison of training curves, regularization, or initialization differences between the original and looped settings, it remains possible that the reported performance relies on factors outside the mechanistic study.
  2. [§4.3] §4.3 and Table 2: The redundancy metrics (layer-wise similarity and prediction stabilization) are reported for the six models, yet the manuscript does not quantify how much of the original performance is retained when the looped model is evaluated on the same held-out sets with error bars. If the performance gap exceeds a few percentage points on any benchmark, the 'comparable' claim and the 20% parameter reduction would require stronger justification.
minor comments (2)
  1. [§4] Ensure all figures in §4 include axis labels, legend entries, and error bars or confidence intervals; several panels currently rely on qualitative description of 'stages' without quantitative thresholds.
  2. The abstract states 'comparable performance' without metrics; add a concise summary table in the main text (or appendix) listing exact accuracy/F1 deltas and parameter counts for each of the six source models versus the looped variant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We appreciate the positive assessment of the mechanistic study, its scale, and the open code. We address each major comment below, agreeing where additional evidence is needed and indicating the revisions made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§5] §5 (Looped single-layer model): The central claim that observed depthwise redundancy directly enables a looped single-layer model to reach comparable performance at 20% parameter count is not yet load-bearing. The depthwise analysis in §4 shows layer similarity and stage-wise refinement in the original models, but provides no ablations demonstrating that selecting and repeating one layer (or equivalent) preserves the necessary refinement dynamics when the reduced model is trained from scratch or fine-tuned. Without explicit comparison of training curves, regularization, or initialization differences between the original and looped settings, it remains possible that the reported performance relies on factors outside the mechanistic study.

    Authors: We agree that the connection between the §4 observations and the looped architecture would benefit from stronger empirical grounding. The looped model is presented as a proof-of-concept guided by the identified redundancy patterns rather than a direct causal demonstration. In the revised manuscript we have expanded §5 with: (i) explicit details on the training protocol (same optimizer, learning rate schedule, and regularization as the original models, trained from scratch on the same data splits); (ii) a new figure comparing training curves of the original multi-layer models versus the looped variants, showing comparable convergence; and (iii) an ablation that repeats a randomly chosen layer versus the layer selected according to our similarity metrics, confirming that the performance advantage is tied to the mechanistic findings. These additions address the concern that performance may stem from unrelated factors. revision: yes

  2. Referee: [§4.3] §4.3 and Table 2: The redundancy metrics (layer-wise similarity and prediction stabilization) are reported for the six models, yet the manuscript does not quantify how much of the original performance is retained when the looped model is evaluated on the same held-out sets with error bars. If the performance gap exceeds a few percentage points on any benchmark, the 'comparable' claim and the 20% parameter reduction would require stronger justification.

    Authors: We thank the referee for this observation. We have revised Table 2 to include side-by-side performance numbers for the original TFMs and the looped single-layer models on the identical held-out test sets. All entries now report mean and standard deviation over five independent runs with different random seeds. The updated results show that the looped models retain 95–98 % of the original performance on average, with absolute gaps below 2 percentage points on most benchmarks. We have added a short discussion paragraph noting the few cases with larger gaps and possible contributing factors. This provides the requested quantification and supports the 'comparable' claim under the 20 % parameter budget. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observations guide architecture design without self-referential reduction

full rationale

The paper reports a mechanistic analysis of layerwise dynamics in six existing tabular foundation models, identifying redundancy patterns and inference stages through direct inspection of activations and predictions. It then states that these findings 'guide' the design of a looped single-layer proof-of-concept model, which is subsequently trained and evaluated for parameter efficiency and performance. No equations, fitted parameters, or uniqueness theorems are presented that would make any reported result equivalent to its own inputs by construction. The looped-model claim is an empirical outcome of training the new architecture, not a tautological renaming or statistical forcing from the original depthwise measurements. No self-citations appear as load-bearing premises.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal explicit assumptions; the central claims rest on the representativeness of the six chosen models and the validity of layerwise activation comparisons.

axioms (1)
  • domain assumption The six state-of-the-art tabular in-context learning models are representative of the broader class of transformer-based tabular foundation models.
    The study draws general conclusions about depthwise redundancy from this specific set.

pith-pipeline@v0.9.0 · 5431 in / 1061 out tokens · 43202 ms · 2026-05-08T12:36:37.278209+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    Computational Linguistics , year =

    Belinkov, Y. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 2022. doi:10.1162/coli_a_00422. URL https://doi.org/10.1162/coli_a_00422

  2. [2]

    Eliciting Latent Predictions from Transformers with the Tuned Lens

    Belrose, N., Furman, Z., Smith, L., Halawi, D., Ostrovsky, I., McKinney, L., Biderman, S., and Steinhardt, J. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023

  3. [3]

    C., Németh, L., Oala, L., Purucker, L., Ravi, S., van Rijn , J

    Bischl, B., Casalicchio, G., Das, T., Feurer, M., Fischer, S., Gijsbers, P., Mukherjee, S., Müller, A. C., Németh, L., Oala, L., Purucker, L., Ravi, S., van Rijn , J. N., Singh, P., Vanschoren, J., van der Velde , J., and Wever, M. OpenML : Insights from 10 years and more than a thousand papers. Patterns, 6 0 (7): 0 101317, 2025. ISSN 2666-3899. doi:https...

  4. [4]

    Hyperdimensional probe: Decoding llm representations via vector symbolic architectures

    Bronzini, M., Nicolini, C., Lepri, B., Staiano, J., and Passerini, A. Hyperdimensional probe: Decoding llm representations via vector symbolic architectures. arXiv preprint arXiv:2509.25045, 2025

  5. [5]

    Towards automated circuit discovery for mechanistic interpretability

    Conmy, A., Mavor-Parker, A., Lynch, A., Heimersheim, S., and Garriga-Alonso, A. Towards automated circuit discovery for mechanistic interpretability. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '23) . C...

  6. [6]

    Knowledge neurons in pretrained transformers

    Dai, D., Dong, L., Hao, Y., Sui, Z., Chang, B., and Wei, F. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8493--8502, 2022

  7. [7]

    Analyzing transformers in embedding space

    Dar, G., Geva, M., Gupta, A., and Berant, J. Analyzing transformers in embedding space. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 16124--16170, 2023

  8. [8]

    Reliability of CKA as a similarity measure in deep learning

    Davari, M., Horoi, S., Natik, A., Lajoie, G., Wolf, G., and Belilovsky, E. Reliability of CKA as a similarity measure in deep learning. In The Eleventh International Conference on Learning Representations ( ICLR '23) . ICLR, 2023

  9. [9]

    Universal transformers

    Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., and Kaiser, L. Universal transformers. In The Seventh International Conference on Learning Representations ( ICLR '19) . ICLR, 2019

  10. [10]

    Y., Karidi, T., Choshen, L., and Geva, M

    Din, A. Y., Karidi, T., Choshen, L., and Geva, M. Jump to conclusions: Short-cutting transformers with linear transformations. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp.\ 9615--9625, 2024

  11. [11]

    A mathematical framework for transformer circuits

    Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1 0 (1): 0 12, 2021

  12. [12]

    M., Salinas, D., and Hutter, F

    Erickson, N., Purucker, L., Tschalzev, A., Holzmüller, D., Desai, P. M., Salinas, D., and Hutter, F. TabArena : A living benchmark for machine learning on tabular data. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Curran Associates, 2025

  13. [13]

    C., and Schmidt, L

    Gardner, J., Pcrdomo, J. C., and Schmidt, L. Large scale transfer learning for tabular data via language modeling. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '24) . Curran Associates, 2024

  14. [14]

    Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space

    Geva, M., Caciularu, A., Wang, K., and Goldberg, Y. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 conference on empirical methods in natural language processing, pp.\ 30--45, 2022

  15. [15]

    Patchscopes: A unifying framework for inspecting hidden representations of language models

    Ghandeharioun, A., Caciularu, A., Pearce, A., Dixon, L., and Geva, M. Patchscopes: A unifying framework for inspecting hidden representations of language models. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning ( ICML '24) , v...

  16. [16]

    What makes looped transformers perform better than non-recursive ones

    Gong, Z., Liu, Y., and Teng, J. What makes looped transformers perform better than non-recursive ones. arXiv preprint arXiv:2510.10089, 2025

  17. [17]

    K., and Schmidhuber, J

    Greff, K., Srivastava, R. K., and Schmidhuber, J. Highway and residual networks learn unrolled iterative estimation. In The Fifth International Conference on Learning Representations ( ICLR '17) . ICLR, 2017

  18. [18]

    TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

    Grinsztajn, L., Fl \"o ge, K., Key, O., Birkel, F., Jund, P., Roof, B., J \"a ger, B., Safaric, D., Alessi, S., Hayler, A., et al. Tabpfn-2.5: Advancing the state of the art in tabular foundation models. arXiv preprint arXiv:2511.08667, 2025

  19. [19]

    The Unreasonable Ineffectiveness of the Deeper Layers

    Gromov, A., Tirumala, K., Shapourian, H., Glorioso, P., and Roberts, D. The Unreasonable Ineffectiveness of the Deeper Layers . In The Thirteenth International Conference on Learning Representations ( ICLR '25) . ICLR, 2025

  20. [20]

    C., Kheirkhah, T

    Gurnee, W., Horsley, T., Guo, Z. C., Kheirkhah, T. R., Sun, Q., Hathaway, W., Nanda, N., and Bertsimas, D. Universal neurons in GPT 2 language models. Transactions on Machine Learning Research, 2024. ISSN 2835-8856

  21. [21]

    Tabllm: Few-shot classification of tabular data with large language models

    Hegselmann, S., Buendia, A., Lang, H., Agrawal, M., Jiang, X., and Sontag, D. Tabllm: Few-shot classification of tabular data with large language models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning ( ICML '23) , volume 202 of Proceedings of...

  22. [22]

    Tab PFN : A transformer that solves small tabular classification problems in a second

    Hollmann, N., M \"u ller, S., Eggensperger, K., and Hutter, F. Tab PFN : A transformer that solves small tabular classification problems in a second. In The Eleventh International Conference on Learning Representations ( ICLR '23) . ICLR, 2023

  23. [23]

    u ller, S., Purucker, L., Krishnakumar, A., K \

    Hollmann, N., M \"u ller, S., Purucker, L., Krishnakumar, A., K \"o rfer, M., Hoo, S. B., Schirrmeister, R. T., and Hutter, F. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025

  24. [24]

    Residual connections encourage iterative inference

    Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., and Bengio, Y. Residual connections encourage iterative inference. In The Sixth International Conference on Learning Representations ( ICLR '18) . ICLR, 2018

  25. [25]

    PMLB mini: A tabular classification benchmark suite for data-scarce applications

    Knauer, R., Grimm, M., and Rodner, E. PMLB mini: A tabular classification benchmark suite for data-scarce applications. In AutoML Conference 2024 (ABCD Track), 2024

  26. [26]

    Similarity of neural network representations revisited

    Kornblith, S., Norouzi, M., Lee, H., and Hinton, G. Similarity of neural network representations revisited. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning ( ICML '19) , volume 97. Proceedings of Machine Learning Research, 2019

  27. [27]

    Early stopping tabular in-context learning

    Küken, J., Purucker, L., and Hutter, F. Early stopping tabular in-context learning. In 1st International Workshop on Foundation Models for Structured Data (FMSD) @ ICML 2025, 2025

  28. [28]

    H., Gurnee, W., and Tegmark, M

    Lad, V., Lee, J. H., Gurnee, W., and Tegmark, M. Remarkable robustness of LLM s: Stages of inference? In Proceedings of the 38th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '25) . Curran Associates, 2025

  29. [29]

    Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025

    LimiXTeam. Limix:unleashing structured-data modeling capability for generalist intelligence. arXiv preprint arXiv:2509.03505, 2025

  30. [30]

    What exactly has TabPFN learned to do? In The Third Blogpost Track at ICLR 2024, 2024

    McCarter, C. What exactly has TabPFN learned to do? In The Third Blogpost Track at ICLR 2024, 2024

  31. [31]

    Copy Suppression: Comprehensively Understanding an Attention Head , shorttitle =

    McDougall, C., Conmy, A., Rushing, C., McGrath, T., and Nanda, N. Copy suppression: Comprehensively understanding an attention head. arXiv preprint arXiv:2310.04625, 2023

  32. [32]

    The Hydra Effect: Emergent Self-repair in Language Model Computations , journal =

    McGrath, T., Rahtz, M., Kramar, J., Mikulik, V., and Legg, S. The hydra effect: Emergent self-repair in language model computations. arXiv preprint arXiv:2307.15771, 2023

  33. [33]

    M., Li, A., Kirchenbauer, J., Kalra, D

    McLeish, S. M., Li, A., Kirchenbauer, J., Kalra, D. S., Bartoldson, B. R., Kailkhura, B., Schwarzschild, A., Geiping, J., Goldblum, M., and Goldstein, T. Teaching pretrained language models to think deeper with retrofitted recurrence. In NeurIPS 2025 Workshop on Efficient Reasoning, 2025

  34. [34]

    Transformers can do B ayesian inference

    M \"u ller, S., Hollmann, N., Arango, S., Grabocka, J., and Hutter, F. Transformers can do B ayesian inference. In The Tenth International Conference on Learning Representations ( ICLR '22) . ICLR, 2022

  35. [35]

    Statistical foundations of prior-data fitted networks

    Nagler, T. Statistical foundations of prior-data fitted networks. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning ( ICML '23) , volume 202 of Proceedings of Machine Learning Research, pp.\ 25660--25676. PMLR, 2023

  36. [36]

    Interpreting GPT : the logit lens

    nostalgebraist. Interpreting GPT : the logit lens. https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, August 2020

  37. [37]

    The building blocks of interpretability

    Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. The building blocks of interpretability. Distill, 3 0 (3): 0 e10, 2018

  38. [38]

    Zoom in: An introduction to circuits

    Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., and Carter, S. Zoom in: An introduction to circuits. Distill, 5 0 (3): 0 e00024--001, 2020

  39. [39]

    Petroni, F., Rockt \"a schel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y., and Miller, A. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.\ 2463--2473, 2019

  40. [40]

    nanotab PFN : A lightweight and educational reimplementation of tab PFN

    Pfefferle, A., Hog, J., Purucker, L., and Hutter, F. nanotab PFN : A lightweight and educational reimplementation of tab PFN . In EurIPS 2025 Workshop: AI for Tabular Data, 2025

  41. [41]

    Qu, J., Holzmüller, D., Varoquaux, G., and Morvan, M. L. Tab ICL : A tabular foundation model for in-context learning on large data. In Proceedings of the 41st International Conference on Machine Learning ( ICML '24) , volume 251 of Proceedings of Machine Learning Research. PMLR, 2024

  42. [42]

    A primer in bertology: What we know about how bert works

    Rogers, A., Kovaleva, O., and Rumshisky, A. A primer in bertology: What we know about how bert works. Transactions of the association for computational linguistics, 8: 0 842--866, 2020

  43. [43]

    and Nanda, N

    Rushing, C. and Nanda, N. Explorations of self-repair in language models. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning ( ICML '24) , volume 251 of Proceedings of Machine Learning Research. PMLR, 2024

  44. [44]

    I., Biderman, S., Garriga-Alonso, A., Conmy, A., Nanda, N., Rumbelow, J

    Sharkey, L., Chughtai, B., Batson, J., Lindsey, J., Wu, J., Bushnaq, L., Goldowsky-Dill, N., Heimersheim, S., Ortega, A., Bloom, J. I., Biderman, S., Garriga-Alonso, A., Conmy, A., Nanda, N., Rumbelow, J. M., Wattenberg, M., Schoots, N., Miller, J., Saunders, W., Michaud, E. J., Casper, S., Tegmark, M., Bau, D., Todd, E., Geiger, A., Geva, M., Hoogland, J...

  45. [45]

    R., Zhao, D., Patel, N

    Skean, O., Arefin, M. R., Zhao, D., Patel, N. N., Naghiyev, J., LeCun, Y., and Shwartz-Ziv, R. Layer by layer: Uncovering hidden representations in language models. In Forty-second International Conference on Machine Learning, 2025

  46. [46]

    LLM interpretability with identifiable temporal-instantaneous representation

    Song, X., Sun, J., Li, Z., Zheng, Y., and Zhang, K. LLM interpretability with identifiable temporal-instantaneous representation. In Proceedings of the 38th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '25) . Curran Associates, 2025

  47. [47]

    K., and Jones, L

    Sun, Q., Pickett, M., Nain, A. K., and Jones, L. Transformer layers as painters. In Proceedings of the Thirty-Eighth Conference on Artificial Intelligence ( AAAI '25) . Association for the Advancement of Artificial Intelligence, AAAI Press, 2025

  48. [48]

    Neurons in large language models: Dead, n-gram, positional

    Voita, E., Ferrando, J., and Nalmpantis, C. Neurons in large language models: Dead, n-gram, positional. In Findings of the Association for Computational Linguistics: ACL 2024, pp.\ 1288--1301, 2024

  49. [49]

    R., Variengien, A., Conmy, A., Shlegeris, B., and Steinhardt, J

    Wang, K. R., Variengien, A., Conmy, A., Shlegeris, B., and Steinhardt, J. Interpretability in the wild: a circuit for indirect object identification in GPT -2 small. In The Eleventh International Conference on Learning Representations ( ICLR '23) . ICLR, 2023

  50. [50]

    Knowledge circuits in pretrained transformers

    Yao, Y., Zhang, N., Xi, Z., Wang, M., Xu, Z., Deng, S., and Chen, H. Knowledge circuits in pretrained transformers. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '24) , volume 37, pp.\ 1185...

  51. [51]

    J., Liu, S

    Ye, H. J., Liu, S. Y., and Chao, W. L. A closer look at TabPFN v2: Understanding its strengths and extending its capabilities. In Proceedings of the 38th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '25) . Curran Associates, 2025

  52. [52]

    From tables to signals: Revealing spectral adaptivity in tabpfn, 2025

    Zheng, J., Gordon, C., Ji, Y., Saratchandran, H., and Lucey, S. From tables to signals: Revealing spectral adaptivity in tabpfn, 2025. URL https://arxiv.org/abs/2511.18278

  53. [53]

    Scaling Latent Reasoning via Looped Language Models

    Zhu, R.-J., Wang, Z., Hua, K., Zhang, T., Li, Z., Que, H., Wei, B., Wen, Z., Yin, F., Xing, H., et al. Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741, 2025

  54. [54]

    Representation Engineering: A Top-Down Approach to AI Transparency

    Zou, A., Phan, L., Chen, S., Campbell, J., Guo, P., Ren, R., Pan, A., Yin, X., Mazeika, M., Dombrowski, A.-K., et al. Representation engineering: A top-down approach to ai transparency. arXiv preprint arXiv:2310.01405, 2023