Recognition: 2 theorem links
· Lean TheoremProbing Classifiers: Promises, Shortcomings, and Advances
Pith reviewed 2026-05-14 21:04 UTC · model grok-4.3
The pith
Probing classifiers can detect linguistic properties in neural model representations but often fail to isolate what the models themselves have learned.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. The basic idea is simple -- a classifier is trained to predict some linguistic property from a model's representations -- and has been used to examine a wide variety of models and properties. However, recent studies have demonstrated various methodological limitations of this approach. This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.
What carries the argument
A probing classifier: a separate supervised model trained to predict a linguistic property from the frozen representations produced by a neural network.
If this is right
- High accuracy from a probing classifier does not necessarily mean the original model has encoded the linguistic property in its representations.
- Control tasks that measure what a probe can learn without access to the model representations are required to validate probing results.
- Advances such as capacity-limited probes and adversarial training can reduce the gap between probe performance and model knowledge.
- Caution is needed when drawing conclusions about model understanding from probing studies alone.
- The framework's reliability improves when combined with methods that constrain what the probe can discover on its own.
Where Pith is reading between the lines
- If the identified shortcomings are addressed through better controls, probing could become a routine part of model development and debugging pipelines.
- The same probing logic might be adapted to non-language domains such as vision or audio to check for analogous interpretability issues.
- A natural extension would be to compare probing results before and after fine-tuning to measure how much linguistic structure is preserved or altered.
- Researchers could test whether certain architectures produce representations that are inherently easier or harder for probes to read out accurately.
Load-bearing premise
The review assumes that the cited studies on probing limitations and advances are representative of the broader literature and that a synthesis without new experiments can accurately capture the framework's overall status.
What would settle it
A new large-scale experiment that applies probing classifiers to many different models and properties while controlling for probe capacity and finds that accuracy reliably tracks the model's internal knowledge rather than probe artifacts would challenge the paper's emphasis on methodological shortcomings.
read the original abstract
Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. The basic idea is simple -- a classifier is trained to predict some linguistic property from a model's representations -- and has been used to examine a wide variety of models and properties. However, recent studies have demonstrated various methodological limitations of this approach. This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a critical survey of probing classifiers as a methodology for interpreting and analyzing deep neural network representations in natural language processing. It describes the core approach of training a classifier to predict a linguistic property from a model's hidden representations, reviews its widespread use across models and properties, synthesizes recent demonstrations of methodological limitations (such as issues with control tasks, probe complexity, and causal claims), and discusses proposed advances to mitigate these shortcomings.
Significance. If the synthesis accurately represents the cited literature, the paper is significant as a consolidation of a now-standard interpretability technique in NLP. It explicitly credits foundational probing work and the studies that identified limitations, while outlining constructive advances; this can help the community refine practices without requiring new experiments, which is appropriate for a review format.
major comments (2)
- [§4] §4 (Shortcomings): The discussion of control tasks as a diagnostic for what probing actually measures is central to the paper's critique, yet it does not fully address how the choice of control task interacts with the linguistic property under study; this weakens the claim that controls fully isolate the probed information.
- [§5] §5 (Advances): The review of alternative methods (e.g., causal interventions) is presented as progress, but lacks a direct comparison table or quantitative summary of how much these methods reduce the identified shortcomings relative to standard probing; this is load-bearing for the 'advances' section of the central claim.
minor comments (3)
- [Abstract/Introduction] The abstract and introduction use 'recent studies' without naming the key papers in the first paragraph; adding 1-2 citations here would improve immediate clarity.
- [§2] Notation for probe models (e.g., linear vs. MLP) is introduced inconsistently across sections; a short notation table in §2 would help.
- [References] A few citations appear to be from preprints; confirming journal versions or DOIs where available would strengthen the reference list.
Simulated Author's Rebuttal
We thank the referee for their positive assessment and constructive feedback on our survey of probing classifiers. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4] §4 (Shortcomings): The discussion of control tasks as a diagnostic for what probing actually measures is central to the paper's critique, yet it does not fully address how the choice of control task interacts with the linguistic property under study; this weakens the claim that controls fully isolate the probed information.
Authors: We agree that the interaction between control task design and the specific linguistic property deserves more explicit treatment. In the revised version we will expand the relevant subsection of §4 with additional discussion and citations illustrating cases where control tasks may not fully decouple the probed signal (e.g., syntactic vs. semantic properties), thereby clarifying rather than overstating the diagnostic value of controls. revision: yes
-
Referee: [§5] §5 (Advances): The review of alternative methods (e.g., causal interventions) is presented as progress, but lacks a direct comparison table or quantitative summary of how much these methods reduce the identified shortcomings relative to standard probing; this is load-bearing for the 'advances' section of the central claim.
Authors: A quantitative meta-analysis is outside the scope of a survey that synthesizes existing literature. However, we will add a qualitative comparison table to §5 that systematically maps each advance to the shortcomings enumerated in §4, drawing directly on the claims and evidence reported in the cited works. This will make the progress more transparent while remaining faithful to the review format. revision: partial
Circularity Check
No significant circularity in this literature review
full rationale
This paper is a critical survey synthesizing existing literature on probing classifiers rather than advancing new empirical claims, derivations, or quantitative predictions. No mathematical equations, fitted parameters, self-definitional constructs, or load-bearing self-citations that reduce the central thesis to its own inputs appear in the abstract or described content. The review format relies on external cited studies for its synthesis of promises and shortcomings, with no internal reduction of results to fitted inputs or author-specific uniqueness theorems. The central claim rests on representation of prior work, which is independent of the present paper's structure.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The basic idea is simple — a classifier is trained to predict some linguistic property from a model's representations
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
training the probing classifier g can be seen as estimating the mutual information between the intermediate representations fl(x) and the property z
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 25 Pith papers
-
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% str...
-
Architecture Determines Observability of Transformers
Certain transformer architectures lose internal linear signals for decision quality during training, making observability an architecture-dependent property rather than a universal one.
-
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
QAOD projects away question-aligned directions from answer representations to isolate domain-agnostic factuality signals, enabling efficient hallucination detection with top in-domain AUROC and up to 21% better OOD transfer.
-
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models
KamonBench is a grammar-generated synthetic dataset of compositional kamon crests with explicit factor annotations to evaluate factor recovery in vision-language models.
-
Deep Minds and Shallow Probes
Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.
-
What Do EEG Foundation Models Capture from Human Brain Signals?
EEG foundation models encode 68.6% of a 63-feature clinical lexicon in a representation-causal way, with frequency-domain features dominant; these recover 79.3% of the models' advantage over random baselines on average.
-
What Do EEG Foundation Models Capture from Human Brain Signals?
EEG foundation models encode many traditional hand-crafted features like frequency power, recovering on average 79% of their advantage over random baselines on clinical tasks while leaving residuals on harder ones.
-
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.
-
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
-
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.
-
Latent Space Probing for Adult Content Detection in Video Generative Models
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
-
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
Sparse autoencoders on EEG transformers identify three regimes of clinical concept encoding and reveal entanglements such as age-pathology confounding via a new steering selectivity metric.
-
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
-
Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
A tabular foundation model with LLM-as-Observer features predicts AI agent decisions in controlled games, outperforming baselines by 4 AUC points and 14% lower error at K=16 interactions.
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
-
Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces
Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
-
Conceptors for Semantic Steering
Conceptors as soft projection matrices from bipolar activations offer a multidimensional, compositional, and geometrically principled method for semantic steering in LLMs that outperforms single-vector baselines in mu...
-
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
-
Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
LLMs encode accurate but brittle internal beliefs about latent game states and convert them poorly into actions, creating systematic gaps that explain strategic failures.
-
Architecture Determines Observability of Transformers
Architecture and training determine whether transformers retain a readable internal signal that lets activation monitors catch errors missed by output confidence.
-
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
-
Instructions Shape Production of Language, not Processing
Instructions primarily shape the production stage of language models rather than the processing stage, with task-specific information and causal effects stronger in output tokens than input tokens.
-
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
-
Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card
The note proposes applying emotion probes to SAE-analyzed strategic concealment episodes to test if emotion vectors capture causal emotions or situational projections in AI models.
Reference graph
Works this paper leans on
-
[1]
Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?
Ravichander, Abhilasha and Belinkov, Yonatan and Hovy, Eduard. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021
work page 2021
-
[2]
Designing and Interpreting Probes with Control Tasks
Hewitt, John and Liang, Percy. Designing and Interpreting Probes with Control Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1275
-
[3]
Zhang, Kelly and Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5448
-
[4]
Analyzing analytical methods: The case of phonology in neural models of spoken language
Chrupa a, Grzegorz and Higy, Bertrand and Alishahi, Afra. Analyzing analytical methods: The case of phonology in neural models of spoken language. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.381
-
[5]
Information-Theoretic Probing for Linguistic Structure
Pimentel, Tiago and Valvoda, Josef and Hall Maudslay, Rowan and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan. Information-Theoretic Probing for Linguistic Structure. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.420
-
[6]
Information-Theoretic Probing with Minimum Description Length
Voita, Elena and Titov, Ivan. Information-Theoretic Probing with Minimum Description Length. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.14
-
[7]
P areto Probing: T rading Off Accuracy for Complexity
Pimentel, Tiago and Saphra, Naomi and Williams, Adina and Cotterell, Ryan. P areto Probing: T rading Off Accuracy for Complexity. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.254
-
[8]
Giulianelli, Mario and Harding, Jack and Mohnert, Florian and Hupkes, Dieuwke and Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5426
-
[9]
Transactions of the Association for Computational Linguistics , volume =
Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Transactions of the Association for Computational Linguistics , volume =. 2021 , keywords =
work page 2021
-
[10]
International Conference on Learning Representations , year=
Identifying and Controlling Important Neurons in Neural Machine Translation , author=. International Conference on Learning Representations , year=
-
[11]
Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =
Vig, Jesse and Gehrmann, Sebastian and Belinkov, Yonatan and Qian, Sharon and Nevo, Daniel and Singer, Yaron and Shieber, Stuart , booktitle =. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =
-
[12]
Computational Linguistics , volume =
Feder, Amir and Oved, Nadav and Shalit, Uri and Reichart, Roi , title = ". Computational Linguistics , volume =. 2021 , month =. doi:10.1162/coli_a_00404 , url =
-
[13]
and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E
Liu, Nelson F. and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E. and Smith, Noah A. Linguistic Knowledge and Transferability of Contextual Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi...
-
[14]
What ' s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation
K. What ' s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1246
-
[15]
Distributional vectors encode referential attributes
Gupta, Abhijeet and Boleda, Gemma and Baroni, Marco and Pad \'o , Sebastian. Distributional vectors encode referential attributes. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1002
-
[17]
Does String-Based Neural MT Learn Source Syntax?
Shi, Xing and Padhi, Inkit and Knight, Kevin. Does String-Based Neural MT Learn Source Syntax?. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1159
-
[18]
Probing for semantic evidence of composition by means of simple classification tasks
Ettinger, Allyson and Elgohary, Ahmed and Resnik, Philip. Probing for semantic evidence of composition by means of simple classification tasks. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. 2016. doi:10.18653/v1/W16-2524
-
[19]
What you can cram into a single \ &!\#* vector:
Conneau, Alexis and Kruszewski, German and Lample, Guillaume and Barrault, Lo. What you can cram into a single \ & ! \# * vector: Probing sentence embeddings for linguistic properties. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1198
-
[20]
International Conference on Learning Representations (ICLR) , year=
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , author=. International Conference on Learning Representations (ICLR) , year=
-
[21]
Journal of Artificial Intelligence Research , volume=
Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , author=. Journal of Artificial Intelligence Research , volume=
-
[22]
What do Neural Machine Translation Models Learn about Morphology?
Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James. What do Neural Machine Translation Models Learn about Morphology?. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1080
-
[23]
Analysis Methods in Neural Language Processing: A Survey
Belinkov, Yonatan and Glass, James. Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics. 2019. doi:10.1162/tacl_a_00254
-
[24]
An information theoretic view on selecting linguistic probes
Zhu, Zining and Rudzicz, Frank. An information theoretic view on selecting linguistic probes. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.744
-
[25]
Interpretability and Analysis in Neural NLP
Belinkov, Yonatan and Gehrmann, Sebastian and Pavlick, Ellie. Interpretability and Analysis in Neural NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 2020. doi:10.18653/v1/2020.acl-tutorials.1
-
[26]
Interpreting Predictions of NLP Models
Wallace, Eric and Gardner, Matt and Singh, Sameer. Interpreting Predictions of NLP Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. 2020. doi:10.18653/v1/2020.emnlp-tutorials.3
-
[27]
A Survey of the State of Explainable AI for Natural Language Processing
Danilevsky, Marina and Qian, Kun and Aharonov, Ranit and Katsis, Yannis and Kawas, Ban and Sen, Prithviraj. A Survey of the State of Explainable AI for Natural Language Processing. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language P...
work page 2020
-
[28]
On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition , author =. 2018 , month =
work page 2018
-
[29]
Belinkov, Yonatan and M \`a rquez, Llu \' s and Sajjad, Hassan and Durrani, Nadir and Dalvi, Fahim and Glass, James. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017
work page 2017
-
[30]
Computational Linguistics in the Netherlands Journal , author=
Investigating `Aspect' in. Computational Linguistics in the Netherlands Journal , author=. 2017 , month=
work page 2017
-
[31]
The emergence of number and syntax units in LSTM language models
Lakretz, Yair and Kruszewski, German and Desbordes, Theo and Hupkes, Dieuwke and Dehaene, Stanislas and Baroni, Marco. The emergence of number and syntax units in LSTM language models. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa...
-
[32]
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pages =
Pearl, Judea , title =. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pages =. 2001 , isbn =
work page 2001
-
[33]
A Structural Probe for Finding Syntax in Word Representations
Hewitt, John and Manning, Christopher D. A Structural Probe for Finding Syntax in Word Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1419
-
[34]
A Tale of a Probe and a Parser
Hall Maudslay, Rowan and Valvoda, Josef and Pimentel, Tiago and Williams, Adina and Cotterell, Ryan. A Tale of a Probe and a Parser. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.659
-
[35]
Michael, Julian and Botha, Jan A. and Tenney, Ian. Asking without Telling: Exploring Latent Ontologies in Contextual Representations. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.552
-
[36]
On the Linguistic Representational Power of Neural Machine Translation Models
Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James. On the Linguistic Representational Power of Neural Machine Translation Models. Computational Linguistics. 2020. doi:10.1162/coli_a_00367
-
[37]
D e F ormer: Decomposing Pre-trained Transformers for Faster Question Answering
Cao, Qingqing and Trivedi, Harsh and Balasubramanian, Aruna and Balasubramanian, Niranjan. D e F ormer: Decomposing Pre-trained Transformers for Faster Question Answering. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.411
-
[38]
Analyzing Redundancy in Pretrained Transformer Models
Dalvi, Fahim and Sajjad, Hassan and Durrani, Nadir and Belinkov, Yonatan. Analyzing Redundancy in Pretrained Transformer Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.398
-
[39]
Hierarchical Multitask Learning for
Krishna, Kalpesh and Toshniwal, Shubham and Livescu, Karen , journal=. Hierarchical Multitask Learning for
-
[40]
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , url =
Belinkov, Yonatan and Glass, James , booktitle =. Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , url =
-
[41]
What Does BERT Look at? An Analysis of BERT ' s Attention
Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D. What Does BERT Look at? An Analysis of BERT ' s Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2019. doi:10.18653/v1/W19-4828
-
[42]
Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT
Wu, Zhiyong and Chen, Yun and Kao, Ben and Liu, Qun. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.383
-
[43]
On the shortest arborescence of a directed graph
CHU, Y. On the shortest arborescence of a directed graph. Science Sinica. 1965
work page 1965
-
[44]
Journal of Research of the national Bureau of Standards B , volume=
Optimum branchings , author=. Journal of Research of the national Bureau of Standards B , volume=
-
[45]
Lepori, Michael and McCoy, R. Thomas. Picking BERT ' s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.325
-
[46]
Correlating Neural and Symbolic Representations of Language
Chrupa a, Grzegorz and Alishahi, Afra. Correlating Neural and Symbolic Representations of Language. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1283
-
[47]
URLhttps://doi.org/10.3389/neuro.06.004.2008
Kriegeskorte, Nikolaus and Mur, Marieke and Bandettini, Peter , TITLE=. Frontiers in Systems Neuroscience , VOLUME=. 2008 , URL=. doi:10.3389/neuro.06.004.2008 , ISSN=
-
[48]
An Analysis of Encoder Representations in Transformer-Based Machine Translation
Raganato, Alessandro and Tiedemann, J. An Analysis of Encoder Representations in Transformer-Based Machine Translation. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5431
-
[49]
From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions
Mare c ek, David and Rosa, Rudolf. From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2019. doi:10.18653/v1/W19-4827
-
[51]
Investigating Transferability in Pretrained Language Models
Tamkin, Alex and Singh, Trisha and Giovanardi, Davide and Goodman, Noah. Investigating Transferability in Pretrained Language Models. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.125
-
[52]
Exploring Semantic Properties of Sentence Embeddings
Zhu, Xunjie and Li, Tingfeng and de Melo, Gerard. Exploring Semantic Properties of Sentence Embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2100
-
[53]
Low-Complexity Probing via Finding Subnetworks
Cao, Steven and Sanh, Victor and Rush, Alexander. Low-Complexity Probing via Finding Subnetworks. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.74
-
[54]
D irect P robe: Studying Representations without Classifiers
Zhou, Yichu and Srikumar, Vivek. D irect P robe: Studying Representations without Classifiers. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.401
-
[55]
What if This Modified That? Syntactic Interventions with Counterfactual Embeddings
Tucker, Mycal and Qian, Peng and Levy, Roger. What if This Modified That? Syntactic Interventions with Counterfactual Embeddings. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.76
-
[56]
Diagnostic classifiers revealing how neural networks process hierarchical structure , author=. CoCo@ NIPS , year=
-
[57]
Yossi Adi and Einat Kermany and Yonatan Belinkov and Ofer Lavi and Yoav Goldberg , title =. CoRR , volume =. 2016 , url =
work page 2016
-
[58]
A Primer in BERT ology: What We Know About How BERT Works
Rogers, Anna and Kovaleva, Olga and Rumshisky, Anna. A Primer in BERT ology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00349
-
[59]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...
-
[60]
International Conference on Learning Representations , year=
What do you learn from context? Probing for sentence structure in contextualized word representations , author=. International Conference on Learning Representations , year=
-
[61]
International Conference on Learning Representations , year=
Predicting Inductive Biases of Pre-Trained Models , author=. International Conference on Learning Representations , year=
-
[62]
When Do You Need Billions of Words of Pretraining Data?
Zhang, Yian and Warstadt, Alex and Li, Xiaocheng and Bowman, Samuel R. When Do You Need Billions of Words of Pretraining Data?. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.90
-
[63]
Adi, Yossi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. CoRR, abs/1608.04207
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[64]
Adi, Yossi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In International Conference on Learning Representations (ICLR)
work page 2017
-
[65]
Alain, Guillaume and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644v3
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[66]
Bau, Anthony, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2019. Identifying and controlling important neurons in neural machine translation. In International Conference on Learning Representations
work page 2019
-
[67]
Belinkov, Yonatan. 2018. On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition. Ph.D. thesis, Massachusetts Institute of Technology
work page 2018
-
[68]
Belinkov, Yonatan, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2017 a . What do neural machine translation models learn about morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 861--872, Association for Computational Linguistics, Vancouver, Canada
work page 2017
-
[69]
Belinkov, Yonatan, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2020. On the linguistic representational power of neural machine translation models. Computational Linguistics, 46(1):1--52
work page 2020
-
[70]
Belinkov, Yonatan, Sebastian Gehrmann, and Ellie Pavlick. 2020. Interpretability and analysis in neural NLP . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 1--5, Association for Computational Linguistics, Online
work page 2020
-
[71]
Belinkov, Yonatan and James Glass. 2017. Analyzing hidden representations in end-to-end automatic speech recognition systems. In Advances in Neural Information Processing Systems, volume 30, pages 2441--2451, Curran Associates, Inc
work page 2017
-
[72]
Belinkov, Yonatan and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49--72
work page 2019
-
[73]
Belinkov, Yonatan, Llu \' s M \`a rquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2017 b . Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1--10, Asian ...
work page 2017
-
[74]
Cao, Qingqing, Harsh Trivedi, Aruna Balasubramanian, and Niranjan Balasubramanian. 2020. D e F ormer: Decomposing pre-trained transformers for faster question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4487--4497, Association for Computational Linguistics, Online
work page 2020
-
[75]
Cao, Steven, Victor Sanh, and Alexander Rush. 2021. Low-complexity probing via finding subnetworks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 960--966, Association for Computational Linguistics, Online
work page 2021
-
[76]
Chrupa a, Grzegorz and Afra Alishahi. 2019. Correlating neural and symbolic representations of language. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2952--2962, Association for Computational Linguistics, Florence, Italy
work page 2019
-
[77]
Chrupa a, Grzegorz, Bertrand Higy, and Afra Alishahi. 2020. Analyzing analytical methods: The case of phonology in neural models of spoken language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4146--4156, Association for Computational Linguistics, Online
work page 2020
-
[78]
CHU, Y. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14:1396--1400
work page 1965
-
[79]
Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What does BERT look at? an analysis of BERT ' s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276--286, Association for Computational Linguistics, Florence, Italy
work page 2019
-
[80]
Conneau, Alexis, German Kruszewski, Guillaume Lample, Lo \" c Barrault, and Marco Baroni. 2018. What you can cram into a single \ & ! \# * vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126--2136, Association for Compu...
work page 2018
-
[81]
Dalvi, Fahim, Hassan Sajjad, Nadir Durrani, and Yonatan Belinkov. 2020. Analyzing redundancy in pretrained transformer models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4908--4926, Association for Computational Linguistics, Online
work page 2020
-
[82]
Danilevsky, Marina, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processi...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.