Recognition: 2 theorem links
· Lean Theoremfmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery
Pith reviewed 2026-05-12 04:49 UTC · model grok-4.3
The pith
Factorized masked crosscoders recover 3-13 times more cross-layer features than standard crosscoders in transformers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Standard crosscoders fail to recover cross-layer features because their per-layer weights are unconstrained and cross-layer dependence is unregularized, so most latents collapse to surface patterns in a single layer. fmxcoders address both problems by replacing the encoder and decoder with low-rank tensor factorizations whose per-layer slices come from a shared basis, and by training with stochastic layer masking that penalizes any latent whose activation collapses when one layer is masked. The resulting shared latents act as better concept detectors and yield large gains in probing accuracy, reconstruction quality, and semantic coherence across four different base models.
What carries the argument
Low-rank tensor factorization of the encoder and decoder weights that forces every latent to draw its per-layer contributions from one shared cross-layer basis, combined with stochastic layer masking that penalizes latents for depending on only one or two layers.
Load-bearing premise
The functional coherence metric and the LLM-as-a-judge scores truly reflect genuine cross-layer semantic meaning rather than simply rewarding the factorized architecture.
What would settle it
An ablation that removes either the tensor factorization or the layer masking and finds that functional coherence and probing F1 gains disappear, or a human annotation study in which the new latents do not receive higher semantic coherence ratings than those from standard crosscoders.
Figures
read the original abstract
Many features in pretrained Transformers span multiple layers: they emerge through stages of inference, persist in the residual stream, or are built jointly by parallel MLPs. Crosscoders (namely, sparse dictionaries trained jointly across layers) aim to recover these cross-layer features in a single shared latent space. We show that standard crosscoders largely fail at this purpose. Although their decoder weight norms spread evenly across layers, a functional coherence metric we introduce reveals that each latent's activation is effectively driven by only one or two layers on average. While functionally coherent latents act as human-interpretable concept detectors (e.g., US states and cities), the layer-localized latents that crosscoders predominantly learn collapse onto surface-level patterns such as digit detectors. We trace this failure to two structural limitations: unconstrained cross-layer parameterization and unregularized cross-layer dependence. We address both by introducing fmxcoders, which (i) replace the encoder and decoder with low-rank tensor factorizations that draw every latent's per-layer weights from a shared cross-layer basis, and (ii) apply stochastic layer masking, a denoising regularizer along the layer axis that penalizes latents whose contribution collapses when a single layer is masked. Across GPT2-Small, Pythia-410M, Pythia-1.4B, and Gemma2-2B, fmxcoders lift mean probing F1 by 10-30 points, surpassing per-layer SAE baselines that standard crosscoders fail to reach, reduce reconstruction MSE by 25-50%, and roughly double mean functional coherence. An LLM-as-a-judge evaluation further shows that fmxcoders recover 3-13$\times$ more semantically coherent latents than standard crosscoders across all four base LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that standard crosscoders for sparse dictionary learning across transformer layers largely fail to recover truly cross-layer features, as their latents are predominantly driven by only one or two layers despite even decoder weight norms; it introduces fmxcoders using low-rank tensor factorizations for shared cross-layer bases in encoder and decoder plus stochastic layer masking as a regularizer, and reports that this yields 10-30 point gains in mean probing F1 (surpassing per-layer SAE baselines), 25-50% lower reconstruction MSE, roughly doubled mean functional coherence, and 3-13× more semantically coherent latents (via LLM-as-a-judge) across GPT2-Small, Pythia-410M, Pythia-1.4B, and Gemma2-2B.
Significance. If the newly introduced functional coherence metric and LLM judge evaluations prove to be architecture-agnostic measures of genuine cross-layer semantics, the work would meaningfully advance mechanistic interpretability by providing a practical method to extract features that emerge or persist across layers rather than collapsing to surface patterns. The multi-model scale of the evaluation and direct comparison against both standard crosscoders and per-layer SAEs are clear strengths; the introduction of a denoising regularizer along the layer dimension is a technically clean idea.
major comments (3)
- [Abstract and §3] Abstract and §3 (functional coherence metric definition): The metric is defined to penalize latents whose activations are driven by few layers on average. Because fmxcoders explicitly optimize against single-layer collapse via stochastic masking, reported doublings of mean functional coherence risk being partly tautological rather than evidence of superior feature discovery; an ablation isolating the masking term from the factorization (or an external validation set of known cross-layer concepts) is needed to establish that the metric adds independent information.
- [Abstract and §4] Abstract and §4 (LLM-as-a-judge evaluation): The claim of recovering 3-13× more semantically coherent latents rests on an LLM judge whose prompts and scoring rubric are not shown to be invariant to the structural differences (low-rank shared bases) introduced by fmxcoders. If the judge implicitly rewards more factorized or less surface-level activations, the multiplier cannot be attributed cleanly to better cross-layer discovery; a blinded human evaluation on a held-out subset or comparison against an architecture-agnostic coherence probe would strengthen the result.
- [§5] §5 (experimental results): The reported probing F1 and MSE gains are presented without error bars, standard deviations across random seeds, or statistical significance tests. Given that the central empirical claim is consistent improvement across four models and multiple metrics, the absence of variability estimates makes it difficult to assess whether the 10-30 point F1 lift and 25-50% MSE reduction are robust or sensitive to hyperparameter choices such as factorization rank.
minor comments (3)
- [Methods] The paper should report the chosen factorization rank for each model and any sensitivity analysis, as this is listed among the free parameters and directly affects the low-rank assumption.
- [Figures] Figure captions and axis labels in the results section would benefit from explicit mention of the number of latents and the exact masking probability used, to aid reproducibility.
- [§5] A brief discussion of how the per-layer SAE baselines were trained (same sparsity, same dictionary size) would clarify that the comparison is fair.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive feedback on our manuscript. We have carefully reviewed each major comment and provide detailed point-by-point responses below. We outline the specific revisions we will make to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (functional coherence metric definition): The metric is defined to penalize latents whose activations are driven by few layers on average. Because fmxcoders explicitly optimize against single-layer collapse via stochastic masking, reported doublings of mean functional coherence risk being partly tautological rather than evidence of superior feature discovery; an ablation isolating the masking term from the factorization (or an external validation set of known cross-layer concepts) is needed to establish that the metric adds independent information.
Authors: We acknowledge the close relationship between the stochastic layer masking regularizer and the functional coherence metric, as the former is explicitly designed to discourage single-layer collapse. However, the metric itself is an independent, post-hoc measure of the average number of layers that meaningfully contribute to each latent's activations, and it is not part of the training loss. To demonstrate that the reported gains reflect genuine improvements in cross-layer feature discovery rather than a tautology, we will add an ablation study in the revised manuscript. This ablation will train factorized models both with and without the masking term and report the resulting differences in functional coherence, probing F1, MSE, and the number of coherent latents. We will also include concrete examples of known cross-layer concepts (e.g., entity tracking and syntactic features) recovered by fmxcoders to provide external validation of the metric. revision: yes
-
Referee: [Abstract and §4] Abstract and §4 (LLM-as-a-judge evaluation): The claim of recovering 3-13× more semantically coherent latents rests on an LLM judge whose prompts and scoring rubric are not shown to be invariant to the structural differences (low-rank shared bases) introduced by fmxcoders. If the judge implicitly rewards more factorized or less surface-level activations, the multiplier cannot be attributed cleanly to better cross-layer discovery; a blinded human evaluation on a held-out subset or comparison against an architecture-agnostic coherence probe would strengthen the result.
Authors: We agree that transparency and validation of the LLM-as-a-judge protocol are essential to rule out bias from the low-rank factorization. In the revised manuscript, we will include the complete prompts and scoring rubric in the appendix. To further strengthen the claim, we will add a blinded human evaluation on a held-out subset of 100 latents per model, where human evaluators (unaware of the source method) rate semantic coherence based on top activating examples and tokens. We will report inter-rater agreement and correlation with the LLM scores. While a full-scale human study across all models and latents is not feasible, this targeted evaluation will provide independent corroboration that the 3-13× increase corresponds to improved cross-layer semantic coherence. revision: partial
-
Referee: [§5] §5 (experimental results): The reported probing F1 and MSE gains are presented without error bars, standard deviations across random seeds, or statistical significance tests. Given that the central empirical claim is consistent improvement across four models and multiple metrics, the absence of variability estimates makes it difficult to assess whether the 10-30 point F1 lift and 25-50% MSE reduction are robust or sensitive to hyperparameter choices such as factorization rank.
Authors: We thank the referee for highlighting this important presentation issue. In the revised §5, we will report all key metrics (probing F1, MSE, functional coherence) with error bars indicating standard deviation across at least three independent random seeds per configuration. We will also include statistical significance tests (paired t-tests with p-values) for the primary comparisons against standard crosscoders and per-layer SAEs. Additionally, we will add a sensitivity analysis varying the factorization rank and reporting its effect on performance to address robustness to hyperparameter choices. revision: yes
Circularity Check
No significant circularity in fmxcoders empirical claims
full rationale
The paper introduces a new architecture (low-rank factorized encoder/decoder plus stochastic layer masking) and a new functional coherence metric to diagnose standard crosscoders. Reported gains in probing F1 (10-30 points) and reconstruction MSE (25-50%) are measured on independent tasks across multiple models and are not reducible to quantities defined inside the training objective or metric by construction. The coherence improvement is expected from the explicit regularizer but does not render the other metrics tautological. No self-citations, uniqueness theorems, ansatzes smuggled via prior work, or renamings of known results appear as load-bearing steps. This is the normal non-circular outcome for an empirical architecture paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- factorization rank
- masking probability
axioms (1)
- domain assumption Features in pretrained transformers can be usefully represented as sparse activations of dictionary elements that are consistent across layers.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
fmxcoders replace the encoder and decoder with low-rank tensor factorizations... and apply stochastic layer masking
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
functional coherence cf_i = sum Sℓ_i / max Sℓ_i
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen, R. Grosse, S. McCandlish, J. Kaplan, D. Amodei, M. Wattenberg, and C. Olah, “Toy models of superposition,”Transformer Circuits Thread,
-
[2]
Available: https://transformer-circuits.pub/2022/toy_model/index.html
[Online]. Available: https://transformer-circuits.pub/2022/toy_model/index.html
work page 2022
-
[3]
Open Problems in Mechanistic Interpretability
L. Sharkey, B. Chughtai, J. Batson, J. Lindsey, J. Wu, L. Bushnaq, N. Goldowsky-Dill, S. Heimer- sheim, A. Ortega, J. Bloom, S. Biderman, A. Garriga-Alonso, A. Conmy, N. Nanda, J. Rumbe- low, M. Wattenberg, N. Schoots, J. Miller, E. J. Michaud, S. Casper, M. Tegmark, W. Saunders, D. Bau, E. Todd, A. Geiger, M. Geva, J. Hoogland, D. Murfet, and T. McGrath,...
work page internal anchor Pith review arXiv 2025
-
[4]
Towards monosemanticity: Decomposing language models with dictionary learning,
T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Denison, A. Askell, R. Lasenby, Y . Wu, S. Kravec, N. Schiefer, T. Maxwell, N. Joseph, Z. Hatfield-Dodds, A. Tamkin, K. Nguyen, B. McLean, J. E. Burke, T. Hume, S. Carter, T. Henighan, and C. Olah, “Towards monosemanticity: Decomposing language models with dictiona...
work page 2023
-
[5]
Sparse autoencoders find highly interpretable features in language models,
H. Cunningham, A. Ewart, L. Riggs, R. Huben, and L. Sharkey, “Sparse autoencoders find highly interpretable features in language models,” inInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[6]
Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet,
A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H. Cunningham, N. L. Turner, C. McDougall, M. MacDiarmid, C. D. Freeman, T. R. Sumers, E. Rees, J. Batson, A. Jermyn, S. Carter, C. Olah, and T. Henighan, “Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet,”Transfo...
work page 2024
-
[7]
Scaling and evaluating sparse autoencoders,
L. Gao, T. Dupré la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, and J. Wu, “Scaling and evaluating sparse autoencoders,” inInternational Conference on Learning Representations (ICLR), 2025, oral presentation
work page 2025
-
[8]
Tang, T., Luo, W., Huang, H., Zhang, D., Wang, X., Zhao, W
S. Rajamanoharan, T. Lieberum, N. Sonnerat, A. Conmy, V . Varma, J. Kramár, and N. Nanda, “Jumping ahead: Improving reconstruction fidelity with JumpReLU sparse autoencoders,”arXiv preprint arXiv:2407.14435, 2024
-
[9]
BatchTopK sparse autoencoders,
B. Bussmann, P. Leask, and N. Nanda, “BatchTopK sparse autoencoders,” inNeurIPS 2024 Workshop on Scientific Methods for Understanding Neural Networks, 2024
work page 2024
-
[10]
and Oldfield, James and Panagakis, Yannis and Nicolaou, Mihalis A
P. Koromilas, A. D. Demou, J. Oldfield, Y . Panagakis, and M. Nicolaou, “Polysae: Mod- eling feature interactions in sparse autoencoders via polynomial decoding,”arXiv preprint arXiv:2602.01322, 2026
-
[11]
Knowledge neurons in pretrained transformers,
D. Dai, L. Dong, Y . Hao, Z. Sui, B. Chang, and F. Wei, “Knowledge neurons in pretrained transformers,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 8493–8502
work page 2022
-
[12]
Remarkable robustness of LLMs: Stages of inference?
V . Lad, J. H. Lee, W. Gurnee, and M. Tegmark, “Remarkable robustness of LLMs: Stages of inference?” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems,
-
[13]
Available: https://openreview.net/forum?id=Wxh5Xz7NpJ
[Online]. Available: https://openreview.net/forum?id=Wxh5Xz7NpJ
-
[14]
Transformer feed-forward layers build pre- dictions by promoting concepts in the vocabulary space,
M. Geva, A. Caciularu, K. Wang, and Y . Goldberg, “Transformer feed-forward layers build pre- dictions by promoting concepts in the vocabulary space,” inProceedings of the 2022 conference on empirical methods in natural language processing, 2022, pp. 30–45
work page 2022
-
[15]
arXiv preprint arXiv:2401.12181 , year=
W. Gurnee, T. Horsley, Z. C. Guo, T. R. Kheirkhah, Q. Sun, W. Hathaway, N. Nanda, and D. Bertsimas, “Universal neurons in gpt2 language models,”arXiv preprint arXiv:2401.12181, 2024
-
[16]
Residual stream analysis with multi-layer SAEs,
T. Lawson, L. Farnik, C. Houghton, and L. Aitchison, “Residual stream analysis with multi-layer SAEs,” inInternational Conference on Learning Representations (ICLR), 2025. 10
work page 2025
-
[17]
Mechanistic permutability: Match features across layers,
N. Balagansky, I. Maksimov, and D. Gavrilov, “Mechanistic permutability: Match features across layers,” inInternational Conference on Learning Representations (ICLR), 2025
work page 2025
-
[18]
Sparse crosscoders for cross-layer features and model diffing,
J. Lindsey, A. Templeton, J. Marcus, T. Conerly, J. Batson, and C. Olah, “Sparse crosscoders for cross-layer features and model diffing,”Transformer Circuits Thread, 2024. [Online]. Available: https://transformer-circuits.pub/2024/crosscoders/index.html
work page 2024
-
[19]
Circuit tracing: Revealing computational graphs in language models,
E. Ameisen, J. Lindsey, A. Pearce, W. Gurnee, N. L. Turner, B. Chen, C. Citro, D. Abrahams, S. Carter, B. Hosmer, J. Marcus, M. Sklar, A. Templeton, T. Bricken, C. McDougall, H. Cunningham, T. Henighan, A. Jermyn, A. Jones, A. Persic, Z. Qi, T. B. Thompson, S. Zimmerman, K. Rivoire, T. Conerly, C. Olah, and J. Batson, “Circuit tracing: Revealing computati...
work page 2025
-
[20]
Overcoming sparsity artifacts in crosscoders to interpret chat-tuning,
C. Dumas, J. Minder, C. Juang, B. Chughtai, and N. Nanda, “Overcoming sparsity artifacts in crosscoders to interpret chat-tuning,” inMechanistic Interpretability Workshop at NeurIPS 2025, 2025
work page 2025
-
[21]
Tensor methods in computer vision and deep learning,
Y . Panagakis, J. Kossaifi, G. G. Chrysos, J. Oldfield, M. A. Nicolaou, A. Anandkumar, and S. Zafeiriou, “Tensor methods in computer vision and deep learning,”Proceedings of the IEEE, vol. 109, no. 5, pp. 863–890, 2021
work page 2021
-
[22]
Understanding polysemanticity in neural networks through coding theory,
S. C. Marshall and J. H. Kirchner, “Understanding polysemanticity in neural networks through coding theory,”arXiv preprint arXiv:2401.17975, 2024
-
[23]
Dropout: A simple way to prevent neural networks from overfitting,
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,”Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014
work page 1929
-
[24]
Towards automated circuit discovery for mechanistic interpretability,
A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, and A. Garriga-Alonso, “Towards automated circuit discovery for mechanistic interpretability,”Advances in Neural Information Processing Systems, vol. 36, pp. 16 318–16 352, 2023
work page 2023
-
[25]
M. Hanna, O. Liu, and A. Variengien, “How does gpt-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model,”Advances in Neural Information Processing Systems, vol. 36, pp. 76 033–76 060, 2023
work page 2023
-
[26]
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
S. Marks, C. Rager, E. J. Michaud, Y . Belinkov, D. Bau, and A. Mueller, “Sparse feature circuits: Discovering and editing interpretable causal graphs in language models,”arXiv preprint arXiv:2403.19647, 2024
work page internal anchor Pith review arXiv 2024
-
[27]
Internal states before wait modulate reasoning patterns,
D. Troitskii, K. Pal, C. Wendler, C. S. McDougall, and N. Nanda, “Internal states before wait modulate reasoning patterns,”Proceedings of the Findings of the Association for Computational Linguistics: EMNLP, 2025
work page 2025
-
[28]
Foundation Models for Discovery and Exploration in Chemical Space
A. Wadell, A. Bhutani, V . Azumah, A. R. Ellis-Mohr, C. Kelly, H. Zhao, A. K. Nayak, K. Hegazy, A. Brace, and H. Lin, “Foundation models for discovery and exploration in chemical space,” arXiv preprint arXiv:2510.18900, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
arXiv preprint arXiv:2503.03730 , year=
D. D. Baek and M. Tegmark, “Towards understanding distilled reasoning models: A representa- tional approach,”arXiv preprint arXiv:2503.03730, 2025
-
[30]
Evolution of concepts in language model pre-training,
X. Ge, W. Shu, J. Wu, Y . Zhou, Z. He, and X. Qiu, “Evolution of concepts in language model pre-training,” inThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[31]
D. Bayazit, A. Mueller, and A. Bosselut, “Crosscoding through time: Tracking emergence & consolidation of linguistic representations throughout LLM pretraining,”arXiv preprint arXiv:2509.05291, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
The expression of a tensor or a polyadic as a sum of products,
F. L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,”Journal of Mathematics and Physics, vol. 6, no. 1-4, pp. 164–189, 1927
work page 1927
-
[33]
Q. Zhao, G. Zhou, S. Xie, L. Zhang, and A. Cichocki, “Tensor Ring Decomposition,”arXiv preprint arXiv:1606.05535, 2016. 11
work page Pith review arXiv 2016
-
[34]
Multilinear mixture of experts: Scalable expert specialization through factorization,
J. Oldfield, M. Georgopoulos, G. G. Chrysos, C. Tzelepis, Y . Panagakis, M. A. Nicolaou, J. Deng, and I. Patras, “Multilinear mixture of experts: Scalable expert specialization through factorization,”Advances in Neural Information Processing Systems, vol. 37, pp. 53 022–53 063, 2024
work page 2024
-
[35]
J. Oldfield, S. Im, S. Li, M. A. Nicolaou, I. Patras, and G. G. Chrysos, “Towards interpretability without sacrifice: Faithful dense layer decomposition with mixture of decoders,”arXiv preprint arXiv:2505.21364, 2025
-
[36]
Extracting and composing robust features with denoising autoencoders,
P. Vincent, H. Larochelle, Y . Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” inProceedings of the 25th International Conference on Machine Learning (ICML). ACM, 2008, pp. 1096–1103
work page 2008
-
[37]
P. Vincent, H. Larochelle, I. Lajoie, Y . Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010
work page 2010
-
[38]
BERT: Pre-training of deep bidirec- tional transformers for language understanding,
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirec- tional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186
work page 2019
-
[39]
Masked autoencoders are scalable vision learners,
K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009
work page 2022
-
[40]
Matching pursuits with time-frequency dictionaries,
S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,”IEEE Trans- actions on signal processing, vol. 41, no. 12, pp. 3397–3415, 1993
work page 1993
-
[41]
Multilinear multitask learning,
B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze, and M. Pontil, “Multilinear multitask learning,” inInternational Conference on Machine Learning. PMLR, 2013, pp. 1444–1452
work page 2013
-
[42]
Deep multi-task representation learning: A tensor factorisation approach,
Y . Yang and T. Hospedales, “Deep multi-task representation learning: A tensor factorisation approach,”arXiv preprint arXiv:1605.06391, 2016
-
[43]
Training with noise is equivalent to Tikhonov regularization,
C. M. Bishop, “Training with noise is equivalent to Tikhonov regularization,”Neural Computa- tion, vol. 7, no. 1, pp. 108–116, 1995
work page 1995
- [44]
-
[45]
Language models are unsu- pervised multitask learners,
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsu- pervised multitask learners,” OpenAI, Tech. Rep., 2019. [Online]. Available: https://cdn.openai. com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
work page 2019
-
[46]
Pythia: A suite for analyzing large language models across training and scaling,
S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, A. Skowron, L. Sutawika, and O. Van Der Wal, “Pythia: A suite for analyzing large language models across training and scaling,” in Proceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machin...
work page 2023
-
[47]
Gemma 2: Improving open language models at a practical size,
Gemma Team, “Gemma 2: Improving open language models at a practical size,” Google DeepMind, Tech. Rep., 2024. [Online]. Available: https://storage.googleapis.com/ deepmind-media/gemma/gemma-2-report.pdf
work page 2024
-
[48]
A. Gokaslan, V . Cohen, E. Pavlick, and S. Tellex, “OpenWebText corpus,” Zenodo, 2019. [Online]. Available: https://zenodo.org/records/3834942
-
[49]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, S. Presser, and C. Leahy, “The Pile: An 800GB dataset of diverse text for language modeling,”arXiv preprint arXiv:2101.00027, 2021. 12
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretability,
A. Karvonen, C. Rager, J. Lin, C. Tigges, J. I. Bloom, D. Chanin, Y .-T. Lau, E. Farrell, C. S. McDougall, K. Ayonrinde, D. Till, M. Wearden, A. Conmy, S. Marks, and N. Nanda, “SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretability,” inProceedings of the 42nd International Conference on Machine Learning, ser. Procee...
work page 2025
-
[51]
Bias in bios: A case study of semantic representation bias in a high-stakes setting,
M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. C. Geyik, K. Kenthapadi, and A. T. Kalai, “Bias in bios: A case study of semantic representation bias in a high-stakes setting,” inProceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19). ACM, 2019, pp. 120–128
work page 2019
-
[52]
Character-level convolutional networks for text classification,
X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional networks for text classification,” inAdvances in Neural Information Processing Systems, vol. 28, 2015. [Online]. Available: https: //proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf
work page 2015
-
[53]
Europarl: A parallel corpus for statistical machine translation,
P. Koehn, “Europarl: A parallel corpus for statistical machine translation,” inProceedings of Machine Translation Summit X: Papers, Phuket, Thailand, Sep. 13-15 2005, pp. 79–86
work page 2005
-
[54]
CodeParrot, “Github code dataset,” https://huggingface.co/datasets/codeparrot/github-code, 2022
work page 2022
-
[55]
Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders
Y . Hou, J. Li, Z. He, A. Yan, X. Chen, and J. McAuley, “Bridging language and items for retrieval and recommendation,”arXiv preprint arXiv:2403.03952, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Foundations of the PARAFAC procedure: Models and conditions for an “explanatory
R. A. Harshman, “Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis,”UCLA working papers in phonetics, vol. 16, no. 1, p. 84, 1970. A Implementation Details This appendix documents the training setup, hyperparameters, and infrastructure used for all experiments in the main paper. All crosscoder var...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.