arxiv: 2102.12452 · v4 · submitted 2021-02-24 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Probing Classifiers: Promises, Shortcomings, and Advances

Yonatan Belinkov

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:04 UTC · model grok-4.3

classification 💻 cs.CL

keywords probing classifiersneural network interpretabilitynatural language processinglinguistic propertiesmodel representationsmethodological limitationscontrol tasks

0 comments

The pith

Probing classifiers can detect linguistic properties in neural model representations but often fail to isolate what the models themselves have learned.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review examines the probing classifiers methodology used to interpret deep neural networks for natural language processing. The core technique trains a separate classifier on a model's internal representations to predict properties such as syntax or semantics. While this has been applied across many models to suggest what linguistic knowledge they encode, recent work shows problems including the probe learning the property independently of the representations and results driven by spurious correlations. The paper brings together the initial promises of the approach with these documented limitations and the methods proposed to improve it.

Core claim

Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. The basic idea is simple -- a classifier is trained to predict some linguistic property from a model's representations -- and has been used to examine a wide variety of models and properties. However, recent studies have demonstrated various methodological limitations of this approach. This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.

What carries the argument

A probing classifier: a separate supervised model trained to predict a linguistic property from the frozen representations produced by a neural network.

If this is right

High accuracy from a probing classifier does not necessarily mean the original model has encoded the linguistic property in its representations.
Control tasks that measure what a probe can learn without access to the model representations are required to validate probing results.
Advances such as capacity-limited probes and adversarial training can reduce the gap between probe performance and model knowledge.
Caution is needed when drawing conclusions about model understanding from probing studies alone.
The framework's reliability improves when combined with methods that constrain what the probe can discover on its own.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the identified shortcomings are addressed through better controls, probing could become a routine part of model development and debugging pipelines.
The same probing logic might be adapted to non-language domains such as vision or audio to check for analogous interpretability issues.
A natural extension would be to compare probing results before and after fine-tuning to measure how much linguistic structure is preserved or altered.
Researchers could test whether certain architectures produce representations that are inherently easier or harder for probes to read out accurately.

Load-bearing premise

The review assumes that the cited studies on probing limitations and advances are representative of the broader literature and that a synthesis without new experiments can accurately capture the framework's overall status.

What would settle it

A new large-scale experiment that applies probing classifiers to many different models and properties while controlling for probe capacity and finds that accuracy reliably tracks the model's internal knowledge rather than probe artifacts would challenge the paper's emphasis on methodological shortcomings.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Belinkov's survey collects the main promises and documented limitations of probing classifiers in NLP but adds no new experiments or derivations.

read the letter

Belinkov's paper is a critical survey that walks through the probing classifiers setup and flags where it has worked and where it has not. The core idea is straightforward: train a simple classifier on frozen model representations to predict linguistic properties. The review does a decent job showing why this caught on quickly, including its ease of use across models and its early results on what syntax or semantics gets captured in different layers. It also gives credit to the original papers that popularized the approach. On shortcomings, it pulls together studies showing problems like probes latching onto surface patterns instead of deeper representations, weak selectivity, and the need for stronger controls. That section is the most useful part because it ties the critiques to specific cited work rather than leaving them abstract. The advances section covers some fixes people have tried, such as better baselines or different probe designs. A soft spot is that the paper is purely synthetic, so its reliability rests on how evenly it covers the literature up to 2021; any gaps in recent advances or over-weighting of certain critiques would weaken the whole thing, though the abstract suggests an even hand. There are no new results, code, or derivations to check. This is for NLP researchers who run or review probing experiments and want a single place to see the common pitfalls. It could work in a reading group for discussion on better analysis practices. I would send it to peer review because a clear synthesis of the method's limits can help the field standardize its interpretability work.

Referee Report

2 major / 3 minor

Summary. The manuscript is a critical survey of probing classifiers as a methodology for interpreting and analyzing deep neural network representations in natural language processing. It describes the core approach of training a classifier to predict a linguistic property from a model's hidden representations, reviews its widespread use across models and properties, synthesizes recent demonstrations of methodological limitations (such as issues with control tasks, probe complexity, and causal claims), and discusses proposed advances to mitigate these shortcomings.

Significance. If the synthesis accurately represents the cited literature, the paper is significant as a consolidation of a now-standard interpretability technique in NLP. It explicitly credits foundational probing work and the studies that identified limitations, while outlining constructive advances; this can help the community refine practices without requiring new experiments, which is appropriate for a review format.

major comments (2)

[§4] §4 (Shortcomings): The discussion of control tasks as a diagnostic for what probing actually measures is central to the paper's critique, yet it does not fully address how the choice of control task interacts with the linguistic property under study; this weakens the claim that controls fully isolate the probed information.
[§5] §5 (Advances): The review of alternative methods (e.g., causal interventions) is presented as progress, but lacks a direct comparison table or quantitative summary of how much these methods reduce the identified shortcomings relative to standard probing; this is load-bearing for the 'advances' section of the central claim.

minor comments (3)

[Abstract/Introduction] The abstract and introduction use 'recent studies' without naming the key papers in the first paragraph; adding 1-2 citations here would improve immediate clarity.
[§2] Notation for probe models (e.g., linear vs. MLP) is introduced inconsistently across sections; a short notation table in §2 would help.
[References] A few citations appear to be from preprints; confirming journal versions or DOIs where available would strengthen the reference list.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment and constructive feedback on our survey of probing classifiers. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (Shortcomings): The discussion of control tasks as a diagnostic for what probing actually measures is central to the paper's critique, yet it does not fully address how the choice of control task interacts with the linguistic property under study; this weakens the claim that controls fully isolate the probed information.

Authors: We agree that the interaction between control task design and the specific linguistic property deserves more explicit treatment. In the revised version we will expand the relevant subsection of §4 with additional discussion and citations illustrating cases where control tasks may not fully decouple the probed signal (e.g., syntactic vs. semantic properties), thereby clarifying rather than overstating the diagnostic value of controls. revision: yes
Referee: [§5] §5 (Advances): The review of alternative methods (e.g., causal interventions) is presented as progress, but lacks a direct comparison table or quantitative summary of how much these methods reduce the identified shortcomings relative to standard probing; this is load-bearing for the 'advances' section of the central claim.

Authors: A quantitative meta-analysis is outside the scope of a survey that synthesizes existing literature. However, we will add a qualitative comparison table to §5 that systematically maps each advance to the shortcomings enumerated in §4, drawing directly on the claims and evidence reported in the cited works. This will make the progress more transparent while remaining faithful to the review format. revision: partial

Circularity Check

0 steps flagged

No significant circularity in this literature review

full rationale

This paper is a critical survey synthesizing existing literature on probing classifiers rather than advancing new empirical claims, derivations, or quantitative predictions. No mathematical equations, fitted parameters, self-definitional constructs, or load-bearing self-citations that reduce the central thesis to its own inputs appear in the abstract or described content. The review format relies on external cited studies for its synthesis of promises and shortcomings, with no internal reduction of results to fitted inputs or author-specific uniqueness theorems. The central claim rests on representation of prior work, which is independent of the present paper's structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper with no new free parameters, axioms, or invented entities; it relies on the existing body of work on probing classifiers without introducing novel postulates.

pith-pipeline@v0.9.0 · 5354 in / 1021 out tokens · 35666 ms · 2026-05-14T21:04:41.751446+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The basic idea is simple — a classifier is trained to predict some linguistic property from a model's representations
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

training the probing classifier g can be seen as estimating the mutual information between the intermediate representations fl(x) and the property z

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 25 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
cs.LG 2026-05 unverdicted novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% str...
Architecture Determines Observability of Transformers
cs.LG 2026-04 unverdicted novelty 8.0

Certain transformer architectures lose internal linear signals for decision quality during training, making observability an architecture-dependent property rather than a universal one.
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
cs.LG 2026-05 unverdicted novelty 7.0

QAOD projects away question-aligned directions from answer representations to isolate domain-agnostic factuality signals, enabling efficient hallucination detection with top in-domain AUROC and up to 21% better OOD transfer.
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 7.0

KamonBench is a grammar-generated synthetic dataset of compositional kamon crests with explicit factor annotations to evaluate factor recovery in vision-language models.
Deep Minds and Shallow Probes
cs.LG 2026-05 unverdicted novelty 7.0

Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.
What Do EEG Foundation Models Capture from Human Brain Signals?
cs.AI 2026-05 unverdicted novelty 7.0

EEG foundation models encode 68.6% of a 63-feature clinical lexicon in a representation-causal way, with frequency-domain features dominant; these recover 79.3% of the models' advantage over random baselines on average.
What Do EEG Foundation Models Capture from Human Brain Signals?
cs.AI 2026-05 unverdicted novelty 7.0

EEG foundation models encode many traditional hand-crafted features like frequency power, recovering on average 79% of their advantage over random baselines on clinical tasks while leaving residuals on harder ones.
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
cs.AI 2026-05 unverdicted novelty 7.0

Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
cs.LG 2026-05 unverdicted novelty 7.0

Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
cs.AI 2026-05 unverdicted novelty 7.0

Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.
Latent Space Probing for Adult Content Detection in Video Generative Models
cs.CV 2026-04 unverdicted novelty 7.0

Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
cs.LG 2026-05 unverdicted novelty 6.0

Sparse autoencoders on EEG transformers identify three regimes of clinical concept encoding and reveal entanglements such as age-pathology confounding via a new steering selectivity metric.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
cs.LG 2026-05 unverdicted novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
cs.LG 2026-05 unverdicted novelty 6.0

A tabular foundation model with LLM-as-Observer features predicts AI agent decisions in controlled games, outperforming baselines by 4 AUC points and 14% lower error at K=16 interactions.
Instructions Shape Production of Language, not Processing
cs.CL 2026-05 unverdicted novelty 6.0

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces
cs.LG 2026-05 unverdicted novelty 6.0

Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
Conceptors for Semantic Steering
cs.LG 2026-05 unverdicted novelty 6.0

Conceptors as soft projection matrices from bipolar activations offer a multidimensional, compositional, and geometrically principled method for semantic steering in LLMs that outperforms single-vector baselines in mu...
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
cs.CL 2026-05 unverdicted novelty 6.0

An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
cs.CL 2026-04 unverdicted novelty 6.0

LLMs encode accurate but brittle internal beliefs about latent game states and convert them poorly into actions, creating systematic gaps that explain strategic failures.
Architecture Determines Observability of Transformers
cs.LG 2026-04 unverdicted novelty 6.0

Architecture and training determine whether transformers retain a readable internal signal that lets activation monitors catch errors missed by output confidence.
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
cs.AI 2023-10 unverdicted novelty 6.0

At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
cs.CL 2022-11 unverdicted novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
Instructions Shape Production of Language, not Processing
cs.CL 2026-05 unverdicted novelty 5.0

Instructions primarily shape the production stage of language models rather than the processing stage, with task-specific information and causal effects stronger in output tokens than input tokens.
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
cs.AI 2026-05 unverdicted novelty 5.0

Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card
cs.HC 2026-04 unverdicted novelty 4.0

The note proposes applying emotion probes to SAE-analyzed strategic concealment episodes to test if emotion vectors capture causal emotions or situational projections in AI models.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 22 Pith papers · 3 internal anchors

[1]

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

Ravichander, Abhilasha and Belinkov, Yonatan and Hovy, Eduard. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021

work page 2021
[2]

Designing and Interpreting Probes with Control Tasks

Hewitt, John and Liang, Percy. Designing and Interpreting Probes with Control Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1275

work page doi:10.18653/v1/d19-1275 2019
[3]

Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis

Zhang, Kelly and Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5448

work page doi:10.18653/v1/w18-5448 2018
[4]

Analyzing analytical methods: The case of phonology in neural models of spoken language

Chrupa a, Grzegorz and Higy, Bertrand and Alishahi, Afra. Analyzing analytical methods: The case of phonology in neural models of spoken language. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.381

work page doi:10.18653/v1/2020.acl-main.381 2020
[5]

Information-Theoretic Probing for Linguistic Structure

Pimentel, Tiago and Valvoda, Josef and Hall Maudslay, Rowan and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan. Information-Theoretic Probing for Linguistic Structure. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.420

work page doi:10.18653/v1/2020.acl-main.420 2020
[6]

Information-Theoretic Probing with Minimum Description Length

Voita, Elena and Titov, Ivan. Information-Theoretic Probing with Minimum Description Length. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.14

work page doi:10.18653/v1/2020.emnlp-main.14 2020
[7]

P areto Probing: T rading Off Accuracy for Complexity

Pimentel, Tiago and Saphra, Naomi and Williams, Adina and Cotterell, Ryan. P areto Probing: T rading Off Accuracy for Complexity. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.254

work page doi:10.18653/v1/2020.emnlp-main.254 2020
[8]

Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

Giulianelli, Mario and Harding, Jack and Mohnert, Florian and Hupkes, Dieuwke and Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5426

work page doi:10.18653/v1/w18-5426 2018
[9]

Transactions of the Association for Computational Linguistics , volume =

Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Transactions of the Association for Computational Linguistics , volume =. 2021 , keywords =

work page 2021
[10]

International Conference on Learning Representations , year=

Identifying and Controlling Important Neurons in Neural Machine Translation , author=. International Conference on Learning Representations , year=

work page
[11]

Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

Vig, Jesse and Gehrmann, Sebastian and Belinkov, Yonatan and Qian, Sharon and Nevo, Daniel and Singer, Yaron and Shieber, Stuart , booktitle =. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

work page
[12]

Computational Linguistics , volume =

Feder, Amir and Oved, Nadav and Shalit, Uri and Reichart, Roi , title = ". Computational Linguistics , volume =. 2021 , month =. doi:10.1162/coli_a_00404 , url =

work page doi:10.1162/coli_a_00404 2021
[13]

and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E

Liu, Nelson F. and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E. and Smith, Noah A. Linguistic Knowledge and Transferability of Contextual Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi...

work page doi:10.18653/v1/n19-1112 2019
[14]

What ' s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation

K. What ' s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1246

work page doi:10.18653/v1/d15-1246 2015
[15]

Distributional vectors encode referential attributes

Gupta, Abhijeet and Boleda, Gemma and Baroni, Marco and Pad \'o , Sebastian. Distributional vectors encode referential attributes. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1002

work page doi:10.18653/v1/d15-1002 2015
[17]

Does String-Based Neural MT Learn Source Syntax?

Shi, Xing and Padhi, Inkit and Knight, Kevin. Does String-Based Neural MT Learn Source Syntax?. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1159

work page doi:10.18653/v1/d16-1159 2016
[18]

Probing for semantic evidence of composition by means of simple classification tasks

Ettinger, Allyson and Elgohary, Ahmed and Resnik, Philip. Probing for semantic evidence of composition by means of simple classification tasks. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. 2016. doi:10.18653/v1/W16-2524

work page doi:10.18653/v1/w16-2524 2016
[19]

What you can cram into a single \ &!\#* vector:

Conneau, Alexis and Kruszewski, German and Lample, Guillaume and Barrault, Lo. What you can cram into a single \ & ! \# * vector: Probing sentence embeddings for linguistic properties. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1198

work page doi:10.18653/v1/p18-1198 2018
[20]

International Conference on Learning Representations (ICLR) , year=

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , author=. International Conference on Learning Representations (ICLR) , year=

work page
[21]

Journal of Artificial Intelligence Research , volume=

Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , author=. Journal of Artificial Intelligence Research , volume=

work page
[22]

What do Neural Machine Translation Models Learn about Morphology?

Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James. What do Neural Machine Translation Models Learn about Morphology?. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1080

work page doi:10.18653/v1/p17-1080 2017
[23]

Analysis Methods in Neural Language Processing: A Survey

Belinkov, Yonatan and Glass, James. Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics. 2019. doi:10.1162/tacl_a_00254

work page doi:10.1162/tacl_a_00254 2019
[24]

An information theoretic view on selecting linguistic probes

Zhu, Zining and Rudzicz, Frank. An information theoretic view on selecting linguistic probes. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.744

work page doi:10.18653/v1/2020.emnlp-main.744 2020
[25]

Interpretability and Analysis in Neural NLP

Belinkov, Yonatan and Gehrmann, Sebastian and Pavlick, Ellie. Interpretability and Analysis in Neural NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 2020. doi:10.18653/v1/2020.acl-tutorials.1

work page doi:10.18653/v1/2020.acl-tutorials.1 2020
[26]

Interpreting Predictions of NLP Models

Wallace, Eric and Gardner, Matt and Singh, Sameer. Interpreting Predictions of NLP Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. 2020. doi:10.18653/v1/2020.emnlp-tutorials.3

work page doi:10.18653/v1/2020.emnlp-tutorials.3 2020
[27]

A Survey of the State of Explainable AI for Natural Language Processing

Danilevsky, Marina and Qian, Kun and Aharonov, Ranit and Katsis, Yannis and Kawas, Ban and Sen, Prithviraj. A Survey of the State of Explainable AI for Natural Language Processing. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language P...

work page 2020
[28]

2018 , month =

On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition , author =. 2018 , month =

work page 2018
[29]

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

Belinkov, Yonatan and M \`a rquez, Llu \' s and Sajjad, Hassan and Durrani, Nadir and Dalvi, Fahim and Glass, James. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017

work page 2017
[30]

Computational Linguistics in the Netherlands Journal , author=

Investigating `Aspect' in. Computational Linguistics in the Netherlands Journal , author=. 2017 , month=

work page 2017
[31]

The emergence of number and syntax units in LSTM language models

Lakretz, Yair and Kruszewski, German and Desbordes, Theo and Hupkes, Dieuwke and Dehaene, Stanislas and Baroni, Marco. The emergence of number and syntax units in LSTM language models. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa...

work page doi:10.18653/v1/n19-1002 2019
[32]

Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pages =

Pearl, Judea , title =. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pages =. 2001 , isbn =

work page 2001
[33]

A Structural Probe for Finding Syntax in Word Representations

Hewitt, John and Manning, Christopher D. A Structural Probe for Finding Syntax in Word Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1419

work page doi:10.18653/v1/n19-1419 2019
[34]

A Tale of a Probe and a Parser

Hall Maudslay, Rowan and Valvoda, Josef and Pimentel, Tiago and Williams, Adina and Cotterell, Ryan. A Tale of a Probe and a Parser. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.659

work page doi:10.18653/v1/2020.acl-main.659 2020
[35]

and Tenney, Ian

Michael, Julian and Botha, Jan A. and Tenney, Ian. Asking without Telling: Exploring Latent Ontologies in Contextual Representations. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.552

work page doi:10.18653/v1/2020.emnlp-main.552 2020
[36]

On the Linguistic Representational Power of Neural Machine Translation Models

Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James. On the Linguistic Representational Power of Neural Machine Translation Models. Computational Linguistics. 2020. doi:10.1162/coli_a_00367

work page doi:10.1162/coli_a_00367 2020
[37]

D e F ormer: Decomposing Pre-trained Transformers for Faster Question Answering

Cao, Qingqing and Trivedi, Harsh and Balasubramanian, Aruna and Balasubramanian, Niranjan. D e F ormer: Decomposing Pre-trained Transformers for Faster Question Answering. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.411

work page doi:10.18653/v1/2020.acl-main.411 2020
[38]

Analyzing Redundancy in Pretrained Transformer Models

Dalvi, Fahim and Sajjad, Hassan and Durrani, Nadir and Belinkov, Yonatan. Analyzing Redundancy in Pretrained Transformer Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.398

work page doi:10.18653/v1/2020.emnlp-main.398 2020
[39]

Hierarchical Multitask Learning for

Krishna, Kalpesh and Toshniwal, Shubham and Livescu, Karen , journal=. Hierarchical Multitask Learning for

work page
[40]

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , url =

Belinkov, Yonatan and Glass, James , booktitle =. Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , url =

work page
[41]

What Does BERT Look at? An Analysis of BERT ' s Attention

Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D. What Does BERT Look at? An Analysis of BERT ' s Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2019. doi:10.18653/v1/W19-4828

work page doi:10.18653/v1/w19-4828 2019
[42]

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

Wu, Zhiyong and Chen, Yun and Kao, Ben and Liu, Qun. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.383

work page doi:10.18653/v1/2020.acl-main.383 2020
[43]

On the shortest arborescence of a directed graph

CHU, Y. On the shortest arborescence of a directed graph. Science Sinica. 1965

work page 1965
[44]

Journal of Research of the national Bureau of Standards B , volume=

Optimum branchings , author=. Journal of Research of the national Bureau of Standards B , volume=

work page
[45]

Lepori, Michael and McCoy, R. Thomas. Picking BERT ' s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.325

work page doi:10.18653/v1/2020.coling-main.325 2020
[46]

Correlating Neural and Symbolic Representations of Language

Chrupa a, Grzegorz and Alishahi, Afra. Correlating Neural and Symbolic Representations of Language. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1283

work page doi:10.18653/v1/p19-1283 2019
[47]

URLhttps://doi.org/10.3389/neuro.06.004.2008

Kriegeskorte, Nikolaus and Mur, Marieke and Bandettini, Peter , TITLE=. Frontiers in Systems Neuroscience , VOLUME=. 2008 , URL=. doi:10.3389/neuro.06.004.2008 , ISSN=

work page doi:10.3389/neuro.06.004.2008 2008
[48]

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato, Alessandro and Tiedemann, J. An Analysis of Encoder Representations in Transformer-Based Machine Translation. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5431

work page doi:10.18653/v1/w18-5431 2018
[49]

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

Mare c ek, David and Rosa, Rudolf. From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2019. doi:10.18653/v1/W19-4827

work page doi:10.18653/v1/w19-4827 2019
[51]

Investigating Transferability in Pretrained Language Models

Tamkin, Alex and Singh, Trisha and Giovanardi, Davide and Goodman, Noah. Investigating Transferability in Pretrained Language Models. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.125

work page doi:10.18653/v1/2020.findings-emnlp.125 2020
[52]

Exploring Semantic Properties of Sentence Embeddings

Zhu, Xunjie and Li, Tingfeng and de Melo, Gerard. Exploring Semantic Properties of Sentence Embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2100

work page doi:10.18653/v1/p18-2100 2018
[53]

Low-Complexity Probing via Finding Subnetworks

Cao, Steven and Sanh, Victor and Rush, Alexander. Low-Complexity Probing via Finding Subnetworks. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.74

work page doi:10.18653/v1/2021.naacl-main.74 2021
[54]

D irect P robe: Studying Representations without Classifiers

Zhou, Yichu and Srikumar, Vivek. D irect P robe: Studying Representations without Classifiers. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.401

work page doi:10.18653/v1/2021.naacl-main.401 2021
[55]

What if This Modified That? Syntactic Interventions with Counterfactual Embeddings

Tucker, Mycal and Qian, Peng and Levy, Roger. What if This Modified That? Syntactic Interventions with Counterfactual Embeddings. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.76

work page doi:10.18653/v1/2021.findings-acl.76 2021
[56]

CoCo@ NIPS , year=

Diagnostic classifiers revealing how neural networks process hierarchical structure , author=. CoCo@ NIPS , year=

work page
[57]

CoRR , volume =

Yossi Adi and Einat Kermany and Yonatan Belinkov and Ofer Lavi and Yoav Goldberg , title =. CoRR , volume =. 2016 , url =

work page 2016
[58]

A Primer in BERT ology: What We Know About How BERT Works

Rogers, Anna and Kovaleva, Olga and Rumshisky, Anna. A Primer in BERT ology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00349

work page doi:10.1162/tacl_a_00349 2020
[59]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...

work page doi:10.18653/v1/n19-1423 2019
[60]

International Conference on Learning Representations , year=

What do you learn from context? Probing for sentence structure in contextualized word representations , author=. International Conference on Learning Representations , year=

work page
[61]

International Conference on Learning Representations , year=

Predicting Inductive Biases of Pre-Trained Models , author=. International Conference on Learning Representations , year=

work page
[62]

When Do You Need Billions of Words of Pretraining Data?

Zhang, Yian and Warstadt, Alex and Li, Xiaocheng and Bowman, Samuel R. When Do You Need Billions of Words of Pretraining Data?. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.90

work page doi:10.18653/v1/2021.acl-long.90 2021
[63]

Adi, Yossi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. CoRR, abs/1608.04207

work page internal anchor Pith review Pith/arXiv arXiv 2016
[64]

Adi, Yossi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In International Conference on Learning Representations (ICLR)

work page 2017
[65]

Alain, Guillaume and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644v3

work page internal anchor Pith review Pith/arXiv arXiv 2016
[66]

Bau, Anthony, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2019. Identifying and controlling important neurons in neural machine translation. In International Conference on Learning Representations

work page 2019
[67]

Belinkov, Yonatan. 2018. On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition. Ph.D. thesis, Massachusetts Institute of Technology

work page 2018
[68]

Belinkov, Yonatan, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2017 a . What do neural machine translation models learn about morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 861--872, Association for Computational Linguistics, Vancouver, Canada

work page 2017
[69]

Belinkov, Yonatan, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2020. On the linguistic representational power of neural machine translation models. Computational Linguistics, 46(1):1--52

work page 2020
[70]

Belinkov, Yonatan, Sebastian Gehrmann, and Ellie Pavlick. 2020. Interpretability and analysis in neural NLP . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 1--5, Association for Computational Linguistics, Online

work page 2020
[71]

Belinkov, Yonatan and James Glass. 2017. Analyzing hidden representations in end-to-end automatic speech recognition systems. In Advances in Neural Information Processing Systems, volume 30, pages 2441--2451, Curran Associates, Inc

work page 2017
[72]

Belinkov, Yonatan and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49--72

work page 2019
[73]

Belinkov, Yonatan, Llu \' s M \`a rquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2017 b . Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1--10, Asian ...

work page 2017
[74]

Cao, Qingqing, Harsh Trivedi, Aruna Balasubramanian, and Niranjan Balasubramanian. 2020. D e F ormer: Decomposing pre-trained transformers for faster question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4487--4497, Association for Computational Linguistics, Online

work page 2020
[75]

Cao, Steven, Victor Sanh, and Alexander Rush. 2021. Low-complexity probing via finding subnetworks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 960--966, Association for Computational Linguistics, Online

work page 2021
[76]

Chrupa a, Grzegorz and Afra Alishahi. 2019. Correlating neural and symbolic representations of language. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2952--2962, Association for Computational Linguistics, Florence, Italy

work page 2019
[77]

Chrupa a, Grzegorz, Bertrand Higy, and Afra Alishahi. 2020. Analyzing analytical methods: The case of phonology in neural models of spoken language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4146--4156, Association for Computational Linguistics, Online

work page 2020
[78]

CHU, Y. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14:1396--1400

work page 1965
[79]

Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What does BERT look at? an analysis of BERT ' s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276--286, Association for Computational Linguistics, Florence, Italy

work page 2019
[80]

Conneau, Alexis, German Kruszewski, Guillaume Lample, Lo \" c Barrault, and Marco Baroni. 2018. What you can cram into a single \ & ! \# * vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126--2136, Association for Compu...

work page 2018
[81]

Dalvi, Fahim, Hassan Sajjad, Nadir Durrani, and Yonatan Belinkov. 2020. Analyzing redundancy in pretrained transformer models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4908--4926, Association for Computational Linguistics, Online

work page 2020
[82]

Danilevsky, Marina, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processi...

work page 2020

Showing first 80 references.