arxiv: 2605.06510 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI

Recognition: unknown

Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models

Amir Rezaei Balef , Mykhailo Koshil , Katharina Eggensperger

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular foundation modelslayerwise dynamicsinference redundancysingle-layer modelsin-context learningtransformer modelsparameter efficiencymodel compression

0 comments

The pith

Tabular foundation models exhibit depthwise redundancy, allowing a looped single-layer design to match full performance with 20% of the parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a large-scale analysis of layer-by-layer prediction formation in six transformer-based tabular foundation models. It identifies distinct inference stages whose computations overlap substantially, pointing to iterative refinement instead of unique contributions from each added layer. This pattern differs from the dynamics seen in language models. The authors use the finding to build a proof-of-concept model that repeatedly applies one layer. The resulting architecture retains comparable accuracy on tabular tasks while using only one-fifth the original parameter count.

Core claim

Analysis of six state-of-the-art tabular in-context learning models shows that predictions emerge through overlapping computations across depth, revealing substantial redundancy and iterative refinement rather than strictly sequential stage-wise progress. Latent-space dynamics also differ from those of language models. These observations support a looped single-layer architecture that achieves comparable benchmark performance with 20% of the original parameters.

What carries the argument

Depthwise redundancy manifesting as overlapping computations across inference stages, which supports iteration of a single layer to emulate multi-layer behavior.

If this is right

Tabular foundation models can be compressed to a fraction of their parameter count while preserving task performance.
Inference proceeds via iterative refinement with substantial overlap between layers rather than distinct sequential computations.
A single layer, when looped, suffices to reproduce the essential prediction dynamics of deeper models on tabular data.
Latent-space evolution in these models differs systematically from the patterns observed in language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same redundancy analysis could be repeated on time-series or graph foundation models to identify similarly compressible architectures.
Training objectives might be modified to explicitly encourage layer overlap, making compression easier at design time.
Inference compute could be made variable by choosing the number of loops based on input difficulty rather than fixing model depth.

Load-bearing premise

The redundancy patterns seen in the six examined models will appear in other tabular foundation models, and the looped single-layer version can be trained and tested without creating new failure modes absent from the original depth analysis.

What would settle it

Evaluating the looped single-layer model on the same tabular benchmarks and finding accuracy drops larger than 5% relative to the original models, or finding additional tabular foundation models whose layer activations show non-overlapping unique contributions.

Figures

Figures reproduced from arXiv: 2605.06510 by Amir Rezaei Balef, Katharina Eggensperger, Mykhailo Koshil.

**Figure 1.** Figure 1: Individually trained decoders (–, - -) exhibit good performance early, showing that representations are descriptive but not aligned with the original decoder. For the original decoder, the sudden increase in ROC-AUC (–) and balanced accuracy (- -) at different layers suggests inference stages. the identification of limitations in existing architectures and the detection of failures during model training a… view at source ↗

**Figure 3.** Figure 3: ⃝1 Embedding similarity over different layers of the respective models (average over all datasets), upper triangular – linear CKA, lower triangular – cosine similarities. Takeaway 1. TFMs often form blocks in which the embeddings remain similar. 2 Separation gap. Unlike LLMs, TFMs have a fixed task, for example, classification. This allows us to track progress towards the goal of separating classes. To do… view at source ↗

**Figure 2.** Figure 2: Experiments analyzing the inference process of tabular ICL models. 1 Embedding similarity. First, we study the similarity of the representation space across layers. Following prior work (Sun et al., 2025; Lad et al., 2025), we examine both the averaged absolute cosine similarity and linear centered kernel alignment (CKA) (Kornblith et al., 2019) between the output embeddings of each layer (see Appendix A.1… view at source ↗

**Figure 4.** Figure 4: ⃝2 Separation gap (mean difference between innerclass and intra-class distances) across layers of the embedding network. Bold lines indicate the average across tasks, and thin lines represent results for individual datasets. features, we also compute the gap value for the (grouped) features and label. See Appendix B.2 for a formal definition and more details. A large separation gap indicates highly discri… view at source ↗

**Figure 7.** Figure 7: ⃝5 Layer ablation effect on the performance of model. Takeaway 5. Early layers contribute the most, while later layers perform iterative refinement of the representation. 6 Self-repair. Here, we study whether TFMs exhibit self-repair mechanisms and, thus, whether layers perform similar or overlapping computations. As observed, TFMs are generally robust to ablating layers. However, it is unclear whether thi… view at source ↗

**Figure 8.** Figure 8: ⃝6 Self-repair analysis under layer skipping. The Tabular Logit Lens measures model performance at each layer (intermediate performance). The solid black line shows intermediate performance without intervention. Colored lines, from blue (early) to orange (late), show intermediate performance after layer ablations, with cross markers indicating skipped layers. Dashed lines connect the first layer after a s… view at source ↗

**Figure 9.** Figure 9: Performance comparison between nanoTabPFN6l, nanoTabPFN1l, and nanoTabPFNlooped. Repeating a single transformer block recovers performance comparable to the full-depth model. 6. Conclusion In this work, we investigate open questions regarding the mechanisms inside TFMs and how they solve predictive tasks. We found that although the layer dynamics in the TFMs we studied differ from those in LLMs, iterativ… view at source ↗

read the original abstract

Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the-art tabular in-context learning models. We explore how predictions emerge across depth, identify distinct stages of inference and reveal latent-space dynamics that differ from those of language models. Our findings indicate substantial depthwise redundancy across multiple models, suggesting iterative refinement with overlapping computations during inference stages. Guided by these insights, we design a proof-of-concept, looped single-layer model that uses only 20% of the original model's parameters while achieving comparable performance. The code is available at https://github.com/amirbalef/is_one_layer_enough.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tabular transformers show clear layer redundancy that the authors turn into a single looped layer using 20% of the parameters, but the evidence that this reduced model trains and generalizes without new issues is still thin.

read the letter

The main thing to know is that the authors ran the first broad layer-by-layer analysis on six tabular in-context learning models and found substantial overlapping computation across depth. They used that observation to build and test a proof-of-concept single-layer model that loops during inference and reaches comparable accuracy at roughly one-fifth the parameter count. The code is public, which is helpful for anyone who wants to check the claims directly. This is new work because prior mechanistic studies have focused almost entirely on language models; the latent-space patterns they report here do differ in measurable ways from those models, and the identification of distinct inference stages is a concrete addition. The empirical scale across multiple architectures gives the redundancy finding some weight. The looped architecture follows logically from the observations and shows that depth is not strictly necessary for these tasks. That said, the central reduction claim rests on the assumption that the redundancy observed in the full models can be captured by repeating one layer after training or adaptation. The abstract gives no quantitative metrics, baselines, error bars, or training details for the looped version, so it is difficult to judge how close the performance actually is or whether new failure modes appear on held-out data. The original depth analysis does not automatically guarantee that a single layer can be optimized to the same level without extra regularization or initialization tricks that were not part of the mechanistic study. This paper is useful for groups working on efficient tabular deployment or on interpretability for non-language transformers. It is coherent on its own terms and deserves a serious referee who can press for the missing quantitative comparisons and generalization checks on the looped model.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the first large-scale mechanistic study of layerwise inference dynamics across six state-of-the-art transformer-based tabular foundation models. It identifies distinct stages of prediction emergence, latent-space dynamics that differ from language models, and substantial depthwise redundancy interpreted as iterative refinement with overlapping computations. Guided by these observations, the authors introduce a proof-of-concept looped single-layer architecture that uses only 20% of the original parameters while achieving comparable performance on tabular tasks; code is provided for reproducibility.

Significance. If the redundancy findings and the looped-model result hold under further scrutiny, the work offers actionable insights for designing more parameter-efficient tabular foundation models and advances mechanistic understanding of transformers outside the language domain. The empirical scale (six models) and open code are strengths that support follow-up research.

major comments (2)

[§5] §5 (Looped single-layer model): The central claim that observed depthwise redundancy directly enables a looped single-layer model to reach comparable performance at 20% parameter count is not yet load-bearing. The depthwise analysis in §4 shows layer similarity and stage-wise refinement in the original models, but provides no ablations demonstrating that selecting and repeating one layer (or equivalent) preserves the necessary refinement dynamics when the reduced model is trained from scratch or fine-tuned. Without explicit comparison of training curves, regularization, or initialization differences between the original and looped settings, it remains possible that the reported performance relies on factors outside the mechanistic study.
[§4.3] §4.3 and Table 2: The redundancy metrics (layer-wise similarity and prediction stabilization) are reported for the six models, yet the manuscript does not quantify how much of the original performance is retained when the looped model is evaluated on the same held-out sets with error bars. If the performance gap exceeds a few percentage points on any benchmark, the 'comparable' claim and the 20% parameter reduction would require stronger justification.

minor comments (2)

[§4] Ensure all figures in §4 include axis labels, legend entries, and error bars or confidence intervals; several panels currently rely on qualitative description of 'stages' without quantitative thresholds.
The abstract states 'comparable performance' without metrics; add a concise summary table in the main text (or appendix) listing exact accuracy/F1 deltas and parameter counts for each of the six source models versus the looped variant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We appreciate the positive assessment of the mechanistic study, its scale, and the open code. We address each major comment below, agreeing where additional evidence is needed and indicating the revisions made to strengthen the manuscript.

read point-by-point responses

Referee: [§5] §5 (Looped single-layer model): The central claim that observed depthwise redundancy directly enables a looped single-layer model to reach comparable performance at 20% parameter count is not yet load-bearing. The depthwise analysis in §4 shows layer similarity and stage-wise refinement in the original models, but provides no ablations demonstrating that selecting and repeating one layer (or equivalent) preserves the necessary refinement dynamics when the reduced model is trained from scratch or fine-tuned. Without explicit comparison of training curves, regularization, or initialization differences between the original and looped settings, it remains possible that the reported performance relies on factors outside the mechanistic study.

Authors: We agree that the connection between the §4 observations and the looped architecture would benefit from stronger empirical grounding. The looped model is presented as a proof-of-concept guided by the identified redundancy patterns rather than a direct causal demonstration. In the revised manuscript we have expanded §5 with: (i) explicit details on the training protocol (same optimizer, learning rate schedule, and regularization as the original models, trained from scratch on the same data splits); (ii) a new figure comparing training curves of the original multi-layer models versus the looped variants, showing comparable convergence; and (iii) an ablation that repeats a randomly chosen layer versus the layer selected according to our similarity metrics, confirming that the performance advantage is tied to the mechanistic findings. These additions address the concern that performance may stem from unrelated factors. revision: yes
Referee: [§4.3] §4.3 and Table 2: The redundancy metrics (layer-wise similarity and prediction stabilization) are reported for the six models, yet the manuscript does not quantify how much of the original performance is retained when the looped model is evaluated on the same held-out sets with error bars. If the performance gap exceeds a few percentage points on any benchmark, the 'comparable' claim and the 20% parameter reduction would require stronger justification.

Authors: We thank the referee for this observation. We have revised Table 2 to include side-by-side performance numbers for the original TFMs and the looped single-layer models on the identical held-out test sets. All entries now report mean and standard deviation over five independent runs with different random seeds. The updated results show that the looped models retain 95–98 % of the original performance on average, with absolute gaps below 2 percentage points on most benchmarks. We have added a short discussion paragraph noting the few cases with larger gaps and possible contributing factors. This provides the requested quantification and supports the 'comparable' claim under the 20 % parameter budget. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observations guide architecture design without self-referential reduction

full rationale

The paper reports a mechanistic analysis of layerwise dynamics in six existing tabular foundation models, identifying redundancy patterns and inference stages through direct inspection of activations and predictions. It then states that these findings 'guide' the design of a looped single-layer proof-of-concept model, which is subsequently trained and evaluated for parameter efficiency and performance. No equations, fitted parameters, or uniqueness theorems are presented that would make any reported result equivalent to its own inputs by construction. The looped-model claim is an empirical outcome of training the new architecture, not a tautological renaming or statistical forcing from the original depthwise measurements. No self-citations appear as load-bearing premises.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal explicit assumptions; the central claims rest on the representativeness of the six chosen models and the validity of layerwise activation comparisons.

axioms (1)

domain assumption The six state-of-the-art tabular in-context learning models are representative of the broader class of transformer-based tabular foundation models.
The study draws general conclusions about depthwise redundancy from this specific set.

pith-pipeline@v0.9.0 · 5431 in / 1061 out tokens · 43202 ms · 2026-05-08T12:36:37.278209+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 12 canonical work pages · 5 internal anchors

[1]

Computational Linguistics , year =

Belinkov, Y. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 2022. doi:10.1162/coli_a_00422. URL https://doi.org/10.1162/coli_a_00422

work page internal anchor Pith review doi:10.1162/coli_a_00422 2022
[2]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Belrose, N., Furman, Z., Smith, L., Halawi, D., Ostrovsky, I., McKinney, L., Biderman, S., and Steinhardt, J. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023

work page internal anchor Pith review arXiv 2023
[3]

C., Németh, L., Oala, L., Purucker, L., Ravi, S., van Rijn , J

Bischl, B., Casalicchio, G., Das, T., Feurer, M., Fischer, S., Gijsbers, P., Mukherjee, S., Müller, A. C., Németh, L., Oala, L., Purucker, L., Ravi, S., van Rijn , J. N., Singh, P., Vanschoren, J., van der Velde , J., and Wever, M. OpenML : Insights from 10 years and more than a thousand papers. Patterns, 6 0 (7): 0 101317, 2025. ISSN 2666-3899. doi:https...

work page doi:10.1016/j.patter.2025.101317 2025
[4]

Hyperdimensional probe: Decoding llm representations via vector symbolic architectures

Bronzini, M., Nicolini, C., Lepri, B., Staiano, J., and Passerini, A. Hyperdimensional probe: Decoding llm representations via vector symbolic architectures. arXiv preprint arXiv:2509.25045, 2025

work page arXiv 2025
[5]

Towards automated circuit discovery for mechanistic interpretability

Conmy, A., Mavor-Parker, A., Lynch, A., Heimersheim, S., and Garriga-Alonso, A. Towards automated circuit discovery for mechanistic interpretability. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '23) . C...

2023
[6]

Knowledge neurons in pretrained transformers

Dai, D., Dong, L., Hao, Y., Sui, Z., Chang, B., and Wei, F. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8493--8502, 2022

2022
[7]

Analyzing transformers in embedding space

Dar, G., Geva, M., Gupta, A., and Berant, J. Analyzing transformers in embedding space. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 16124--16170, 2023

2023
[8]

Reliability of CKA as a similarity measure in deep learning

Davari, M., Horoi, S., Natik, A., Lajoie, G., Wolf, G., and Belilovsky, E. Reliability of CKA as a similarity measure in deep learning. In The Eleventh International Conference on Learning Representations ( ICLR '23) . ICLR, 2023

2023
[9]

Universal transformers

Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., and Kaiser, L. Universal transformers. In The Seventh International Conference on Learning Representations ( ICLR '19) . ICLR, 2019

2019
[10]

Y., Karidi, T., Choshen, L., and Geva, M

Din, A. Y., Karidi, T., Choshen, L., and Geva, M. Jump to conclusions: Short-cutting transformers with linear transformations. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp.\ 9615--9625, 2024

2024
[11]

A mathematical framework for transformer circuits

Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1 0 (1): 0 12, 2021

2021
[12]

M., Salinas, D., and Hutter, F

Erickson, N., Purucker, L., Tschalzev, A., Holzmüller, D., Desai, P. M., Salinas, D., and Hutter, F. TabArena : A living benchmark for machine learning on tabular data. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Curran Associates, 2025

2025
[13]

C., and Schmidt, L

Gardner, J., Pcrdomo, J. C., and Schmidt, L. Large scale transfer learning for tabular data via language modeling. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '24) . Curran Associates, 2024

2024
[14]

Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space

Geva, M., Caciularu, A., Wang, K., and Goldberg, Y. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 conference on empirical methods in natural language processing, pp.\ 30--45, 2022

2022
[15]

Patchscopes: A unifying framework for inspecting hidden representations of language models

Ghandeharioun, A., Caciularu, A., Pearce, A., Dixon, L., and Geva, M. Patchscopes: A unifying framework for inspecting hidden representations of language models. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning ( ICML '24) , v...

2024
[16]

What makes looped transformers perform better than non-recursive ones

Gong, Z., Liu, Y., and Teng, J. What makes looped transformers perform better than non-recursive ones. arXiv preprint arXiv:2510.10089, 2025

work page arXiv 2025
[17]

K., and Schmidhuber, J

Greff, K., Srivastava, R. K., and Schmidhuber, J. Highway and residual networks learn unrolled iterative estimation. In The Fifth International Conference on Learning Representations ( ICLR '17) . ICLR, 2017

2017
[18]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Grinsztajn, L., Fl \"o ge, K., Key, O., Birkel, F., Jund, P., Roof, B., J \"a ger, B., Safaric, D., Alessi, S., Hayler, A., et al. Tabpfn-2.5: Advancing the state of the art in tabular foundation models. arXiv preprint arXiv:2511.08667, 2025

work page internal anchor Pith review arXiv 2025
[19]

The Unreasonable Ineffectiveness of the Deeper Layers

Gromov, A., Tirumala, K., Shapourian, H., Glorioso, P., and Roberts, D. The Unreasonable Ineffectiveness of the Deeper Layers . In The Thirteenth International Conference on Learning Representations ( ICLR '25) . ICLR, 2025

2025
[20]

C., Kheirkhah, T

Gurnee, W., Horsley, T., Guo, Z. C., Kheirkhah, T. R., Sun, Q., Hathaway, W., Nanda, N., and Bertsimas, D. Universal neurons in GPT 2 language models. Transactions on Machine Learning Research, 2024. ISSN 2835-8856

2024
[21]

Tabllm: Few-shot classification of tabular data with large language models

Hegselmann, S., Buendia, A., Lang, H., Agrawal, M., Jiang, X., and Sontag, D. Tabllm: Few-shot classification of tabular data with large language models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning ( ICML '23) , volume 202 of Proceedings of...

2023
[22]

Tab PFN : A transformer that solves small tabular classification problems in a second

Hollmann, N., M \"u ller, S., Eggensperger, K., and Hutter, F. Tab PFN : A transformer that solves small tabular classification problems in a second. In The Eleventh International Conference on Learning Representations ( ICLR '23) . ICLR, 2023

2023
[23]

u ller, S., Purucker, L., Krishnakumar, A., K \

Hollmann, N., M \"u ller, S., Purucker, L., Krishnakumar, A., K \"o rfer, M., Hoo, S. B., Schirrmeister, R. T., and Hutter, F. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025

2025
[24]

Residual connections encourage iterative inference

Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., and Bengio, Y. Residual connections encourage iterative inference. In The Sixth International Conference on Learning Representations ( ICLR '18) . ICLR, 2018

2018
[25]

PMLB mini: A tabular classification benchmark suite for data-scarce applications

Knauer, R., Grimm, M., and Rodner, E. PMLB mini: A tabular classification benchmark suite for data-scarce applications. In AutoML Conference 2024 (ABCD Track), 2024

2024
[26]

Similarity of neural network representations revisited

Kornblith, S., Norouzi, M., Lee, H., and Hinton, G. Similarity of neural network representations revisited. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning ( ICML '19) , volume 97. Proceedings of Machine Learning Research, 2019

2019
[27]

Early stopping tabular in-context learning

Küken, J., Purucker, L., and Hutter, F. Early stopping tabular in-context learning. In 1st International Workshop on Foundation Models for Structured Data (FMSD) @ ICML 2025, 2025

2025
[28]

H., Gurnee, W., and Tegmark, M

Lad, V., Lee, J. H., Gurnee, W., and Tegmark, M. Remarkable robustness of LLM s: Stages of inference? In Proceedings of the 38th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '25) . Curran Associates, 2025

2025
[29]

Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025

LimiXTeam. Limix:unleashing structured-data modeling capability for generalist intelligence. arXiv preprint arXiv:2509.03505, 2025

work page arXiv 2025
[30]

What exactly has TabPFN learned to do? In The Third Blogpost Track at ICLR 2024, 2024

McCarter, C. What exactly has TabPFN learned to do? In The Third Blogpost Track at ICLR 2024, 2024

2024
[31]

Copy Suppression: Comprehensively Understanding an Attention Head , shorttitle =

McDougall, C., Conmy, A., Rushing, C., McGrath, T., and Nanda, N. Copy suppression: Comprehensively understanding an attention head. arXiv preprint arXiv:2310.04625, 2023

work page arXiv 2023
[32]

The Hydra Effect: Emergent Self-repair in Language Model Computations , journal =

McGrath, T., Rahtz, M., Kramar, J., Mikulik, V., and Legg, S. The hydra effect: Emergent self-repair in language model computations. arXiv preprint arXiv:2307.15771, 2023

work page arXiv 2023
[33]

M., Li, A., Kirchenbauer, J., Kalra, D

McLeish, S. M., Li, A., Kirchenbauer, J., Kalra, D. S., Bartoldson, B. R., Kailkhura, B., Schwarzschild, A., Geiping, J., Goldblum, M., and Goldstein, T. Teaching pretrained language models to think deeper with retrofitted recurrence. In NeurIPS 2025 Workshop on Efficient Reasoning, 2025

2025
[34]

Transformers can do B ayesian inference

M \"u ller, S., Hollmann, N., Arango, S., Grabocka, J., and Hutter, F. Transformers can do B ayesian inference. In The Tenth International Conference on Learning Representations ( ICLR '22) . ICLR, 2022

2022
[35]

Statistical foundations of prior-data fitted networks

Nagler, T. Statistical foundations of prior-data fitted networks. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning ( ICML '23) , volume 202 of Proceedings of Machine Learning Research, pp.\ 25660--25676. PMLR, 2023

2023
[36]

Interpreting GPT : the logit lens

nostalgebraist. Interpreting GPT : the logit lens. https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, August 2020

2020
[37]

The building blocks of interpretability

Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. The building blocks of interpretability. Distill, 3 0 (3): 0 e10, 2018

2018
[38]

Zoom in: An introduction to circuits

Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., and Carter, S. Zoom in: An introduction to circuits. Distill, 5 0 (3): 0 e00024--001, 2020

2020
[39]

Petroni, F., Rockt \"a schel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y., and Miller, A. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.\ 2463--2473, 2019

2019
[40]

nanotab PFN : A lightweight and educational reimplementation of tab PFN

Pfefferle, A., Hog, J., Purucker, L., and Hutter, F. nanotab PFN : A lightweight and educational reimplementation of tab PFN . In EurIPS 2025 Workshop: AI for Tabular Data, 2025

2025
[41]

Qu, J., Holzmüller, D., Varoquaux, G., and Morvan, M. L. Tab ICL : A tabular foundation model for in-context learning on large data. In Proceedings of the 41st International Conference on Machine Learning ( ICML '24) , volume 251 of Proceedings of Machine Learning Research. PMLR, 2024

2024
[42]

A primer in bertology: What we know about how bert works

Rogers, A., Kovaleva, O., and Rumshisky, A. A primer in bertology: What we know about how bert works. Transactions of the association for computational linguistics, 8: 0 842--866, 2020

2020
[43]

and Nanda, N

Rushing, C. and Nanda, N. Explorations of self-repair in language models. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning ( ICML '24) , volume 251 of Proceedings of Machine Learning Research. PMLR, 2024

2024
[44]

I., Biderman, S., Garriga-Alonso, A., Conmy, A., Nanda, N., Rumbelow, J

Sharkey, L., Chughtai, B., Batson, J., Lindsey, J., Wu, J., Bushnaq, L., Goldowsky-Dill, N., Heimersheim, S., Ortega, A., Bloom, J. I., Biderman, S., Garriga-Alonso, A., Conmy, A., Nanda, N., Rumbelow, J. M., Wattenberg, M., Schoots, N., Miller, J., Saunders, W., Michaud, E. J., Casper, S., Tegmark, M., Bau, D., Todd, E., Geiger, A., Geva, M., Hoogland, J...

2025
[45]

R., Zhao, D., Patel, N

Skean, O., Arefin, M. R., Zhao, D., Patel, N. N., Naghiyev, J., LeCun, Y., and Shwartz-Ziv, R. Layer by layer: Uncovering hidden representations in language models. In Forty-second International Conference on Machine Learning, 2025

2025
[46]

LLM interpretability with identifiable temporal-instantaneous representation

Song, X., Sun, J., Li, Z., Zheng, Y., and Zhang, K. LLM interpretability with identifiable temporal-instantaneous representation. In Proceedings of the 38th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '25) . Curran Associates, 2025

2025
[47]

K., and Jones, L

Sun, Q., Pickett, M., Nain, A. K., and Jones, L. Transformer layers as painters. In Proceedings of the Thirty-Eighth Conference on Artificial Intelligence ( AAAI '25) . Association for the Advancement of Artificial Intelligence, AAAI Press, 2025

2025
[48]

Neurons in large language models: Dead, n-gram, positional

Voita, E., Ferrando, J., and Nalmpantis, C. Neurons in large language models: Dead, n-gram, positional. In Findings of the Association for Computational Linguistics: ACL 2024, pp.\ 1288--1301, 2024

2024
[49]

R., Variengien, A., Conmy, A., Shlegeris, B., and Steinhardt, J

Wang, K. R., Variengien, A., Conmy, A., Shlegeris, B., and Steinhardt, J. Interpretability in the wild: a circuit for indirect object identification in GPT -2 small. In The Eleventh International Conference on Learning Representations ( ICLR '23) . ICLR, 2023

2023
[50]

Knowledge circuits in pretrained transformers

Yao, Y., Zhang, N., Xi, Z., Wang, M., Xu, Z., Deng, S., and Chen, H. Knowledge circuits in pretrained transformers. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '24) , volume 37, pp.\ 1185...

2024
[51]

J., Liu, S

Ye, H. J., Liu, S. Y., and Chao, W. L. A closer look at TabPFN v2: Understanding its strengths and extending its capabilities. In Proceedings of the 38th International Conference on Advances in Neural Information Processing Systems ( N eur IPS '25) . Curran Associates, 2025

2025
[52]

From tables to signals: Revealing spectral adaptivity in tabpfn, 2025

Zheng, J., Gordon, C., Ji, Y., Saratchandran, H., and Lucey, S. From tables to signals: Revealing spectral adaptivity in tabpfn, 2025. URL https://arxiv.org/abs/2511.18278

work page arXiv 2025
[53]

Scaling Latent Reasoning via Looped Language Models

Zhu, R.-J., Wang, Z., Hua, K., Zhang, T., Li, Z., Que, H., Wei, B., Wen, Z., Yin, F., Xing, H., et al. Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741, 2025

work page internal anchor Pith review arXiv 2025
[54]

Representation Engineering: A Top-Down Approach to AI Transparency

Zou, A., Phan, L., Chen, S., Campbell, J., Guo, P., Ren, R., Pan, A., Yin, X., Mazeika, M., Dombrowski, A.-K., et al. Representation engineering: A top-down approach to ai transparency. arXiv preprint arXiv:2310.01405, 2023

work page internal anchor Pith review arXiv 2023