Where Computation Lives Inside TabPFN: Causal Localisation of Attention Head Function

Atharva Gupta; Dhruv Kumar; Murari Mandal; Saurabh Deshpande

arxiv: 2606.12917 · v1 · pith:QRRJDH57new · submitted 2026-06-11 · 💻 cs.LG

Where Computation Lives Inside TabPFN: Causal Localisation of Attention Head Function

Atharva Gupta , Dhruv Kumar , Murari Mandal , Saurabh Deshpande This is my paper

Pith reviewed 2026-06-27 07:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords TabPFNattention headsactivation patchingcausal localizationtabular foundation modelsin-context learningmechanistic interpretability

0 comments

The pith

TabPFN computation concentrates in one attention head whose peak layer shifts with task complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a causal analysis of attention heads inside the tabular foundation model TabPFN using activation patching on two synthetic regression datasets. It establishes that a single head produces the large majority of the model's causal effect on outputs, exceeding the other heads by a factor of two to five at the layer where its contribution peaks. That peak layer moves depending on how complex the regression task is, while the remaining heads follow matching patterns in later layers. The work also shows that contrastive activation steering does not carry over from one sample to another because the model's in-context learning encodes task information through context-specific attention rather than fixed directions.

Core claim

One feature-wise attention head in TabPFN 2.5 exhibits causal necessity that is two to five times greater than that of the other heads at its peak layer; the layer at which this dominance occurs shifts across regression tasks of different complexity, while the remaining heads display symmetric profiles concentrated in later layers. Convergent measurements from activation patching, ablation, and attention entropy locate the computationally active layers of the dominant head, and contrastive activation steering fails to generalize because task structure is carried by context-dependent attention.

What carries the argument

Activation patching that replaces the output of a single attention head with its value from a different forward pass and measures the resulting change in the model's prediction.

If this is right

A single head accounts for the bulk of causal effect at its peak layer across the tested tasks.
The layer carrying this dominant effect changes when the regression problem varies in complexity.
The other heads show matching behavior concentrated in later layers.
Contrastive activation steering does not transfer across samples because attention patterns encode task information in a context-dependent way.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same head-dominance pattern appears on real-world tabular data, then model editing or pruning efforts could focus compute on the dominant head without retraining the full network.
The observed failure of steering to generalize points to a broader difference in how in-context learning works in tabular transformers compared with language models.
Applying the same patching protocol to classification tasks would test whether the single-head dominance and layer-shift pattern is specific to regression.

Load-bearing premise

The two synthetic regression datasets together with the activation patching procedure accurately measure the true causal contributions of individual heads without artifacts introduced by the patching method or the choice of data.

What would settle it

Running the same activation patching protocol on a collection of real tabular regression datasets and finding that no head reaches two-to-five times the causal effect of the others at any layer would falsify the dominance claim.

Figures

Figures reproduced from arXiv: 2606.12917 by Atharva Gupta, Dhruv Kumar, Murari Mandal, Saurabh Deshpande.

**Figure 1.** Figure 1: Activation patching hierarchy. Component patching targets self attn between features at two granularities: feature-block level (post-projection output) and attention head level (per-head outputs before WO; see Appendix C). Token-level patching results are in Appendix F. cus on self attn between features: it is the only module that operates across feature representations, making it the natural locus for cro… view at source ↗

**Figure 2.** Figure 2: MHA attention head ablation, Multiplication Dataset (n = 512). Head 2 ablation is largest at layer 0; Heads 0 and 1 peak at layers 12–13. Head 2’s distinctive ablation profile. Head 2’s ablation peaks at L0 while its patching peaks at L6: the layer of 2 [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Normalised attention entropy per head at five key layers, computed per sample then averaged (see Appendix E). Lower entropy indicates more concentrated attention. Head 2 has the lowest entropy at layer 6 in both datasets (0.21 on both) and additionally at layer 13 on Pairwise-50 (0.31). Heads 0 and 1 maintain higher entropy (>0.6) at most layers. Head 0 at layer 0 is co-selective with Head 2 (entropy 0.22… view at source ↗

**Figure 5.** Figure 5: MHA head ablation effect (σ), Pairwise-50 (n = 512). Head 2’s peak is at layer 16 (0.074σ), with a secondary peak at layer 0 (0.066σ). Heads 0 and 1 show moderate late-layer effects. across samples: a direction computed on a held-out train split produces near-zero MSE improvement on a test split across all hook sites tested. We attribute this to a structural property of pure ICL architectures: unlike LLMs,… view at source ↗

**Figure 6.** Figure 6: Full-layer patching recovery ratio, Multiplication Dataset. ≈100% recovery at every layer. B.3. Feature-Block Patching Feature-block patching replaces one feature-block position b ∗ in the post-WO output tensor Aℓ ∈ R B×N×F ×k (Equation (1)): A˜ ℓ[:, :, b∗ , :] = A clean ℓ [:, :, b∗ , :]. (1) This operates on the post-WO output of self attn between features and is coarser than head-level patching: it capt… view at source ↗

**Figure 7.** Figure 7: Feature-block patching on the Multiplication Dataset (n = 512). Left: absolute restoration per block. Right: recovery ratio (%). Block 1 (green) dominates early layers before handing off to Block 3 (purple) by layer 13. Pairwise-50 Dataset (n = 512). All feature blocks produce near-zero recovery across all 18 layers, consistent with 6 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

We present the first causal mechanistic analysis of a tabular foundation model, investigating how TabPFN 2.5's feature wise attention heads distribute computation across layers. Using activation patching, ablation, and attention entropy across two synthetic regression datasets, we find clear temporal specialisation: one head's causal necessity dominates that of the others by 2 to 5 times at peak layer, with its dominant layer shifting across tasks of different complexity, while the remaining heads exhibit symmetric late layer profiles. Attention entropy and patching provide convergent evidence for the computationally active layers of the dominant head. We additionally investigate inference time steerability via contrastive activation steering, which fails to transfer across samples. We attribute this result to TabPFN's in context learning mechanism, which encodes task structure through context dependent attention rather than the stable parametric directions that make steering tractable in language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

First causal probe of TabPFN heads on synthetic regressions, but patching artifacts remain a live concern and evidence is still thin.

read the letter

The main takeaway is that this is an early attempt to localize computation inside TabPFN using activation patching, and it reports one head showing 2-5x higher causal necessity than the others at its peak layer, with the peak shifting by task complexity. The rest of the heads look more symmetric in late layers. They also note that contrastive steering does not transfer across samples and tie that to context-dependent attention.

What the work actually does is apply standard mechanistic tools—patching, ablation, and entropy—to a tabular foundation model on two synthetic regression tasks. The convergence between patching and entropy on the active layers is the cleanest part. The quantitative dominance claim is stated directly, which is better than vague localization.

The soft spots are exactly where the stress-test note points. Activation patching on small synthetic regressions can shift downstream statistics in ways that are not necessarily about the original head function, especially without reported random-ablation baselines or same-task controls. The paper does not appear to include those checks, so the reported asymmetry could partly reflect the intervention rather than intrinsic specialization. The steering failure is attributed to in-context learning, but that remains an interpretation rather than a derivation from the equations. No comparisons to prior transformer interp results are visible in the provided text, which makes the “first causal analysis” claim hard to evaluate.

This is for readers already working on tabular foundation models or mechanistic interpretability who want an initial data point on TabPFN. It is not yet solid enough for broad claims, but the direction is reasonable. A serious editor should send it to review so the patching controls and dataset choices can be stress-tested properly.

Referee Report

2 major / 1 minor

Summary. The paper presents the first causal mechanistic analysis of TabPFN 2.5's feature-wise attention heads, using activation patching, ablation, and attention entropy on two synthetic regression datasets. It reports that one head's causal necessity dominates the others by 2-5 times at its peak layer (with the peak shifting by task complexity), while remaining heads show symmetric late-layer profiles; convergent evidence from entropy and patching is cited for the dominant head's active layers. Contrastive activation steering is shown to fail to transfer across samples, which the authors attribute to TabPFN's context-dependent attention in in-context learning rather than stable parametric directions.

Significance. If the patching results hold after controls, this would be a valuable first mechanistic study of a tabular foundation model, extending interpretability methods beyond language models and highlighting how in-context learning organizes computation differently. The convergent evidence from multiple techniques and the falsifiable claim about steering failure are strengths that could guide future work on tabular model internals.

major comments (2)

[Methods (activation patching experiments)] Activation patching subsection: no controls are described for patching-induced artifacts (e.g., random ablation baselines, same-task vs. cross-task patching, or entropy-matched controls). This is load-bearing for the central claim of 2-5x dominance and layer shifts, as the skeptic correctly notes that patching on small synthetic regressions can alter downstream patterns via distribution shift unrelated to the original heads.
[Results (causal necessity and layer profiles)] Results on head dominance and layer profiles: the 2-5x causal necessity ratio and symmetric late-layer profiles are stated without the underlying metric definition, error bars, dataset statistics, or robustness checks. This prevents evaluation of whether the reported asymmetry reflects genuine specialization.

minor comments (1)

[Abstract] Abstract lacks any quantitative details, error bars, or dataset descriptions, which hinders immediate assessment even though the full text is referenced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the experimental controls and reporting of results.

read point-by-point responses

Referee: [Methods (activation patching experiments)] Activation patching subsection: no controls are described for patching-induced artifacts (e.g., random ablation baselines, same-task vs. cross-task patching, or entropy-matched controls). This is load-bearing for the central claim of 2-5x dominance and layer shifts, as the skeptic correctly notes that patching on small synthetic regressions can alter downstream patterns via distribution shift unrelated to the original heads.

Authors: We agree that the absence of explicit controls for patching artifacts is a limitation. In the revised manuscript we will add random ablation baselines, same-task versus cross-task patching comparisons, and entropy-matched controls. These will be reported alongside the original results to demonstrate that the 2-5x dominance and layer shifts are not artifacts of distribution shift. revision: yes
Referee: [Results (causal necessity and layer profiles)] Results on head dominance and layer profiles: the 2-5x causal necessity ratio and symmetric late-layer profiles are stated without the underlying metric definition, error bars, dataset statistics, or robustness checks. This prevents evaluation of whether the reported asymmetry reflects genuine specialization.

Authors: We will expand the results section to define the causal necessity metric explicitly, report error bars (across seeds and datasets), include dataset statistics, and add robustness checks such as alternative patching strengths and cross-validation of the dominance ratio. These changes will allow direct evaluation of the reported asymmetry. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical analysis without derivations or self-referential reductions

full rationale

The paper conducts an empirical mechanistic interpretability study via activation patching, ablation, and entropy measurements on two synthetic regression datasets. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. Claims rest on experimental observations of head dominance and steering failure rather than any chain that reduces by construction to its own inputs. The analysis is self-contained against external benchmarks of model behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, parameters, or background assumptions to populate the ledger.

pith-pipeline@v0.9.1-grok · 5684 in / 1094 out tokens · 23838 ms · 2026-06-27T07:17:56.429147+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Pearl , year=

J. Pearl , year=. Causality , publisher=
[2]

2025 , howpublished =

2025
[3]

2017 , publisher=

Elements of causal inference: foundations and learning algorithms , author=. 2017 , publisher=

2017
[4]

Scientific Reports , year =

Noda, Ryunosuke and Ichikawa, Daisuke and Shibagaki, Yugo , title =. Scientific Reports , year =. doi:10.1038/s41598-024-73898-4 , url =

work page doi:10.1038/s41598-024-73898-4
[5]

and Bolshakov, Evgenii S

Dyikanov, Daniiar and Zaitsev, Aleksandr and Vasileva, Tatiana and Wang, Iris and Sokolov, Arseniy A. and Bolshakov, Evgenii S. and et al. , title =. Cancer Cell , year =. doi:10.1016/j.ccell.2024.04.008 , url =

work page doi:10.1016/j.ccell.2024.04.008 2024
[6]

Diverse scaling strategies of energy communities: A comparative case study analysis of varied governance contexts , journal =

Alzakari, Saud A. and Aldrees, Abdullah and Umer, Muhammad Fahad and Cascone, Luca and Innab, Nader and Ashraf, Imran , title =. SLAS Technology , year =. doi:10.1016/j.slast.2024.100203 , url =

work page doi:10.1016/j.slast.2024.100203 2024
[7]

Asian Spine Journal , year =

Karabacak, Mert and Schupper, Alexander and Carr, Matthew and Margetis, Konstantinos , title =. Asian Spine Journal , year =. doi:10.31616/asj.2024.0048 , url =

work page doi:10.31616/asj.2024.0048 2024
[8]

European Actuarial Journal , year =

Brauer, Alexej , title =. European Actuarial Journal , year =. doi:10.1007/s13385-024-00388-2 , url =

work page doi:10.1007/s13385-024-00388-2
[9]

Proceedings of the 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST) , year =

Chu, Jasmin Ze Kee and Than, Joel Chia Ming and Jo, Hudyjaya Siswoyo , title =. Proceedings of the 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST) , year =

2024
[10]

Nguyen, Hoang , title =
[11]

Early fault classification in rotating machinery with limited data using tabpfn

Magad. Early Fault Classification in Rotating Machinery With Limited Data Using. IEEE Sensors Journal , year =. doi:10.1109/JSEN.2023.3331100 , url =

work page doi:10.1109/jsen.2023.3331100 2023
[12]

Minimal Supervision, Maximum Accuracy: TabPFN for Microcontroller Performance Prediction , booktitle =

Bellarmino, Nicol. Minimal Supervision, Maximum Accuracy: TabPFN for Microcontroller Performance Prediction , booktitle =. 2025 , doi =

2025
[13]

Underground Space , year =

He, Ping and Cao, Zhanlin and Di, Honggui and Shen, Guangxin and Zhou, Shunhua , title =. Underground Space , year =
[14]

2025 , doi =

Chen, Bowen and Xiong, Zhuo and Zhao, Yongchun and Zhang, Junying , title =. 2025 , doi =

2025
[15]

The Journal of Physical Chemistry C , year =

Sharma, Sandeep and others , title =. The Journal of Physical Chemistry C , year =. doi:10.1021/acs.jpcc.5c03868 , url =

work page doi:10.1021/acs.jpcc.5c03868
[16]

2025 , note =

Sharma, Sandeep , title =. 2025 , note =

2025
[17]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000
[18]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980
[19]

M. J. Kearns , title =
[20]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983
[21]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000
[22]

Suppressed for Anonymity , author=
[23]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981
[24]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959
[25]

Advances in Neural Information Processing Systems , year=

Why do tree-based models still outperform deep learning on tabular data? , author=. Advances in Neural Information Processing Systems , year=
[26]

Advances in Neural Information Processing Systems , volume=

Revisiting Deep Learning Models for Tabular Data , author=. Advances in Neural Information Processing Systems , volume=
[27]

International Conference on Learning Representations , year =

TabPFN: A transformer that solves small tabular classification problems in a second , author =. International Conference on Learning Representations , year =
[28]

Train, K

Accurate predictions on small data with a tabular foundation model , author =. Nature , year =. doi:10.1038/s41586-024-08328-6 , publisher =

work page doi:10.1038/s41586-024-08328-6
[29]

Interpretable Machine Learning for TabPFN , ISBN=

Rundel, David and Kobialka, Julius and von Crailsheim, Constantin and Feurer, Matthias and Nagler, Thomas and Rügamer, David , year=. Interpretable Machine Learning for TabPFN , ISBN=. doi:10.1007/978-3-031-63797-1_23 , booktitle=

work page doi:10.1007/978-3-031-63797-1_23
[30]

2024 , eprint=

The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. 2024 , eprint=

2024
[31]

2023 , eprint=

Explainability for Large Language Models: A Survey , author=. 2023 , eprint=

2023
[32]

2020 , note =

nostalgebraist , title =. 2020 , note =

2020
[33]

Probing Classifiers: Promises, Shortcomings, and Advances

Belinkov, Yonatan , title =. Computational Linguistics , volume =. 2022 , month =. doi:10.1162/coli_a_00422 , url =

work page internal anchor Pith review doi:10.1162/coli_a_00422 2022
[34]

2025 , eprint=

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities , author=. 2025 , eprint=

2025
[35]

2022 , eprint=

In-context Learning and Induction Heads , author=. 2022 , eprint=

2022
[36]

2023 , eprint=

In-Context Learning Creates Task Vectors , author=. 2023 , eprint=

2023
[37]

2024 , eprint=

Function Vectors in Large Language Models , author=. 2024 , eprint=

2024
[38]

2018 , eprint=

Understanding intermediate layers using linear classifier probes , author=. 2018 , eprint=

2018
[39]

2021 , eprint=

Probing Classifiers: Promises, Shortcomings, and Advances , author=. 2021 , eprint=

2021
[40]

2025 , eprint=

Exploring Representations and Interventions in Time Series Foundation Models , author=. 2025 , eprint=

2025
[41]

2023 , eprint=

Locating and Editing Factual Associations in GPT , author=. 2023 , eprint=

2023
[42]

2024 , eprint=

How to use and interpret activation patching , author=. 2024 , eprint=

2024
[43]

2025 , eprint=

Grinsztajn, L\'. 2025 , eprint=

2025
[44]

2023 , eprint=

Steering Language Models With Activation Engineering , author=. 2023 , eprint=

2023
[45]

Steering

Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , year=. Steering. 2312.06681 , archivePrefix=

Pith/arXiv arXiv
[46]

2026 , eprint=

In Search of Grandmother Cells: Tracing Interpretable Neurons in Tabular Representations , author=. 2026 , eprint=

2026
[47]

Proceedings of the 42nd International Conference on Machine Learning , series =

Which Attention Heads Matter for In-Context Learning? , author =. Proceedings of the 42nd International Conference on Machine Learning , series =. 2025 , publisher =

2025
[48]

Transformer Circuits Thread , year=

A Mathematical Framework for Transformer Circuits , author=. Transformer Circuits Thread , year=

[1] [1]

Pearl , year=

J. Pearl , year=. Causality , publisher=

[2] [2]

2025 , howpublished =

2025

[3] [3]

2017 , publisher=

Elements of causal inference: foundations and learning algorithms , author=. 2017 , publisher=

2017

[4] [4]

Scientific Reports , year =

Noda, Ryunosuke and Ichikawa, Daisuke and Shibagaki, Yugo , title =. Scientific Reports , year =. doi:10.1038/s41598-024-73898-4 , url =

work page doi:10.1038/s41598-024-73898-4

[5] [5]

and Bolshakov, Evgenii S

Dyikanov, Daniiar and Zaitsev, Aleksandr and Vasileva, Tatiana and Wang, Iris and Sokolov, Arseniy A. and Bolshakov, Evgenii S. and et al. , title =. Cancer Cell , year =. doi:10.1016/j.ccell.2024.04.008 , url =

work page doi:10.1016/j.ccell.2024.04.008 2024

[6] [6]

Diverse scaling strategies of energy communities: A comparative case study analysis of varied governance contexts , journal =

Alzakari, Saud A. and Aldrees, Abdullah and Umer, Muhammad Fahad and Cascone, Luca and Innab, Nader and Ashraf, Imran , title =. SLAS Technology , year =. doi:10.1016/j.slast.2024.100203 , url =

work page doi:10.1016/j.slast.2024.100203 2024

[7] [7]

Asian Spine Journal , year =

Karabacak, Mert and Schupper, Alexander and Carr, Matthew and Margetis, Konstantinos , title =. Asian Spine Journal , year =. doi:10.31616/asj.2024.0048 , url =

work page doi:10.31616/asj.2024.0048 2024

[8] [8]

European Actuarial Journal , year =

Brauer, Alexej , title =. European Actuarial Journal , year =. doi:10.1007/s13385-024-00388-2 , url =

work page doi:10.1007/s13385-024-00388-2

[9] [9]

Proceedings of the 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST) , year =

Chu, Jasmin Ze Kee and Than, Joel Chia Ming and Jo, Hudyjaya Siswoyo , title =. Proceedings of the 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST) , year =

2024

[10] [10]

Nguyen, Hoang , title =

[11] [11]

Early fault classification in rotating machinery with limited data using tabpfn

Magad. Early Fault Classification in Rotating Machinery With Limited Data Using. IEEE Sensors Journal , year =. doi:10.1109/JSEN.2023.3331100 , url =

work page doi:10.1109/jsen.2023.3331100 2023

[12] [12]

Minimal Supervision, Maximum Accuracy: TabPFN for Microcontroller Performance Prediction , booktitle =

Bellarmino, Nicol. Minimal Supervision, Maximum Accuracy: TabPFN for Microcontroller Performance Prediction , booktitle =. 2025 , doi =

2025

[13] [13]

Underground Space , year =

He, Ping and Cao, Zhanlin and Di, Honggui and Shen, Guangxin and Zhou, Shunhua , title =. Underground Space , year =

[14] [14]

2025 , doi =

Chen, Bowen and Xiong, Zhuo and Zhao, Yongchun and Zhang, Junying , title =. 2025 , doi =

2025

[15] [15]

The Journal of Physical Chemistry C , year =

Sharma, Sandeep and others , title =. The Journal of Physical Chemistry C , year =. doi:10.1021/acs.jpcc.5c03868 , url =

work page doi:10.1021/acs.jpcc.5c03868

[16] [16]

2025 , note =

Sharma, Sandeep , title =. 2025 , note =

2025

[17] [17]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000

[18] [18]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980

[19] [19]

M. J. Kearns , title =

[20] [20]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983

[21] [21]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000

[22] [22]

Suppressed for Anonymity , author=

[23] [23]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981

[24] [24]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959

[25] [25]

Advances in Neural Information Processing Systems , year=

Why do tree-based models still outperform deep learning on tabular data? , author=. Advances in Neural Information Processing Systems , year=

[26] [26]

Advances in Neural Information Processing Systems , volume=

Revisiting Deep Learning Models for Tabular Data , author=. Advances in Neural Information Processing Systems , volume=

[27] [27]

International Conference on Learning Representations , year =

TabPFN: A transformer that solves small tabular classification problems in a second , author =. International Conference on Learning Representations , year =

[28] [28]

Train, K

Accurate predictions on small data with a tabular foundation model , author =. Nature , year =. doi:10.1038/s41586-024-08328-6 , publisher =

work page doi:10.1038/s41586-024-08328-6

[29] [29]

Interpretable Machine Learning for TabPFN , ISBN=

Rundel, David and Kobialka, Julius and von Crailsheim, Constantin and Feurer, Matthias and Nagler, Thomas and Rügamer, David , year=. Interpretable Machine Learning for TabPFN , ISBN=. doi:10.1007/978-3-031-63797-1_23 , booktitle=

work page doi:10.1007/978-3-031-63797-1_23

[30] [30]

2024 , eprint=

The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. 2024 , eprint=

2024

[31] [31]

2023 , eprint=

Explainability for Large Language Models: A Survey , author=. 2023 , eprint=

2023

[32] [32]

2020 , note =

nostalgebraist , title =. 2020 , note =

2020

[33] [33]

Probing Classifiers: Promises, Shortcomings, and Advances

Belinkov, Yonatan , title =. Computational Linguistics , volume =. 2022 , month =. doi:10.1162/coli_a_00422 , url =

work page internal anchor Pith review doi:10.1162/coli_a_00422 2022

[34] [34]

2025 , eprint=

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities , author=. 2025 , eprint=

2025

[35] [35]

2022 , eprint=

In-context Learning and Induction Heads , author=. 2022 , eprint=

2022

[36] [36]

2023 , eprint=

In-Context Learning Creates Task Vectors , author=. 2023 , eprint=

2023

[37] [37]

2024 , eprint=

Function Vectors in Large Language Models , author=. 2024 , eprint=

2024

[38] [38]

2018 , eprint=

Understanding intermediate layers using linear classifier probes , author=. 2018 , eprint=

2018

[39] [39]

2021 , eprint=

Probing Classifiers: Promises, Shortcomings, and Advances , author=. 2021 , eprint=

2021

[40] [40]

2025 , eprint=

Exploring Representations and Interventions in Time Series Foundation Models , author=. 2025 , eprint=

2025

[41] [41]

2023 , eprint=

Locating and Editing Factual Associations in GPT , author=. 2023 , eprint=

2023

[42] [42]

2024 , eprint=

How to use and interpret activation patching , author=. 2024 , eprint=

2024

[43] [43]

2025 , eprint=

Grinsztajn, L\'. 2025 , eprint=

2025

[44] [44]

2023 , eprint=

Steering Language Models With Activation Engineering , author=. 2023 , eprint=

2023

[45] [45]

Steering

Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , year=. Steering. 2312.06681 , archivePrefix=

Pith/arXiv arXiv

[46] [46]

2026 , eprint=

In Search of Grandmother Cells: Tracing Interpretable Neurons in Tabular Representations , author=. 2026 , eprint=

2026

[47] [47]

Proceedings of the 42nd International Conference on Machine Learning , series =

Which Attention Heads Matter for In-Context Learning? , author =. Proceedings of the 42nd International Conference on Machine Learning , series =. 2025 , publisher =

2025

[48] [48]

Transformer Circuits Thread , year=

A Mathematical Framework for Transformer Circuits , author=. Transformer Circuits Thread , year=