pith. machine review for the scientific record. sign in

arxiv: 2102.12452 · v4 · submitted 2021-02-24 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Probing Classifiers: Promises, Shortcomings, and Advances

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:04 UTC · model grok-4.3

classification 💻 cs.CL
keywords probing classifiersneural network interpretabilitynatural language processinglinguistic propertiesmodel representationsmethodological limitationscontrol tasks
0
0 comments X

The pith

Probing classifiers can detect linguistic properties in neural model representations but often fail to isolate what the models themselves have learned.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review examines the probing classifiers methodology used to interpret deep neural networks for natural language processing. The core technique trains a separate classifier on a model's internal representations to predict properties such as syntax or semantics. While this has been applied across many models to suggest what linguistic knowledge they encode, recent work shows problems including the probe learning the property independently of the representations and results driven by spurious correlations. The paper brings together the initial promises of the approach with these documented limitations and the methods proposed to improve it.

Core claim

Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. The basic idea is simple -- a classifier is trained to predict some linguistic property from a model's representations -- and has been used to examine a wide variety of models and properties. However, recent studies have demonstrated various methodological limitations of this approach. This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.

What carries the argument

A probing classifier: a separate supervised model trained to predict a linguistic property from the frozen representations produced by a neural network.

If this is right

  • High accuracy from a probing classifier does not necessarily mean the original model has encoded the linguistic property in its representations.
  • Control tasks that measure what a probe can learn without access to the model representations are required to validate probing results.
  • Advances such as capacity-limited probes and adversarial training can reduce the gap between probe performance and model knowledge.
  • Caution is needed when drawing conclusions about model understanding from probing studies alone.
  • The framework's reliability improves when combined with methods that constrain what the probe can discover on its own.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the identified shortcomings are addressed through better controls, probing could become a routine part of model development and debugging pipelines.
  • The same probing logic might be adapted to non-language domains such as vision or audio to check for analogous interpretability issues.
  • A natural extension would be to compare probing results before and after fine-tuning to measure how much linguistic structure is preserved or altered.
  • Researchers could test whether certain architectures produce representations that are inherently easier or harder for probes to read out accurately.

Load-bearing premise

The review assumes that the cited studies on probing limitations and advances are representative of the broader literature and that a synthesis without new experiments can accurately capture the framework's overall status.

What would settle it

A new large-scale experiment that applies probing classifiers to many different models and properties while controlling for probe capacity and finds that accuracy reliably tracks the model's internal knowledge rather than probe artifacts would challenge the paper's emphasis on methodological shortcomings.

read the original abstract

Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. The basic idea is simple -- a classifier is trained to predict some linguistic property from a model's representations -- and has been used to examine a wide variety of models and properties. However, recent studies have demonstrated various methodological limitations of this approach. This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript is a critical survey of probing classifiers as a methodology for interpreting and analyzing deep neural network representations in natural language processing. It describes the core approach of training a classifier to predict a linguistic property from a model's hidden representations, reviews its widespread use across models and properties, synthesizes recent demonstrations of methodological limitations (such as issues with control tasks, probe complexity, and causal claims), and discusses proposed advances to mitigate these shortcomings.

Significance. If the synthesis accurately represents the cited literature, the paper is significant as a consolidation of a now-standard interpretability technique in NLP. It explicitly credits foundational probing work and the studies that identified limitations, while outlining constructive advances; this can help the community refine practices without requiring new experiments, which is appropriate for a review format.

major comments (2)
  1. [§4] §4 (Shortcomings): The discussion of control tasks as a diagnostic for what probing actually measures is central to the paper's critique, yet it does not fully address how the choice of control task interacts with the linguistic property under study; this weakens the claim that controls fully isolate the probed information.
  2. [§5] §5 (Advances): The review of alternative methods (e.g., causal interventions) is presented as progress, but lacks a direct comparison table or quantitative summary of how much these methods reduce the identified shortcomings relative to standard probing; this is load-bearing for the 'advances' section of the central claim.
minor comments (3)
  1. [Abstract/Introduction] The abstract and introduction use 'recent studies' without naming the key papers in the first paragraph; adding 1-2 citations here would improve immediate clarity.
  2. [§2] Notation for probe models (e.g., linear vs. MLP) is introduced inconsistently across sections; a short notation table in §2 would help.
  3. [References] A few citations appear to be from preprints; confirming journal versions or DOIs where available would strengthen the reference list.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment and constructive feedback on our survey of probing classifiers. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§4] §4 (Shortcomings): The discussion of control tasks as a diagnostic for what probing actually measures is central to the paper's critique, yet it does not fully address how the choice of control task interacts with the linguistic property under study; this weakens the claim that controls fully isolate the probed information.

    Authors: We agree that the interaction between control task design and the specific linguistic property deserves more explicit treatment. In the revised version we will expand the relevant subsection of §4 with additional discussion and citations illustrating cases where control tasks may not fully decouple the probed signal (e.g., syntactic vs. semantic properties), thereby clarifying rather than overstating the diagnostic value of controls. revision: yes

  2. Referee: [§5] §5 (Advances): The review of alternative methods (e.g., causal interventions) is presented as progress, but lacks a direct comparison table or quantitative summary of how much these methods reduce the identified shortcomings relative to standard probing; this is load-bearing for the 'advances' section of the central claim.

    Authors: A quantitative meta-analysis is outside the scope of a survey that synthesizes existing literature. However, we will add a qualitative comparison table to §5 that systematically maps each advance to the shortcomings enumerated in §4, drawing directly on the claims and evidence reported in the cited works. This will make the progress more transparent while remaining faithful to the review format. revision: partial

Circularity Check

0 steps flagged

No significant circularity in this literature review

full rationale

This paper is a critical survey synthesizing existing literature on probing classifiers rather than advancing new empirical claims, derivations, or quantitative predictions. No mathematical equations, fitted parameters, self-definitional constructs, or load-bearing self-citations that reduce the central thesis to its own inputs appear in the abstract or described content. The review format relies on external cited studies for its synthesis of promises and shortcomings, with no internal reduction of results to fitted inputs or author-specific uniqueness theorems. The central claim rests on representation of prior work, which is independent of the present paper's structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper with no new free parameters, axioms, or invented entities; it relies on the existing body of work on probing classifiers without introducing novel postulates.

pith-pipeline@v0.9.0 · 5354 in / 1021 out tokens · 35666 ms · 2026-05-14T21:04:41.751446+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 25 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

    cs.LG 2026-05 unverdicted novelty 8.0

    Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% str...

  2. Architecture Determines Observability of Transformers

    cs.LG 2026-04 unverdicted novelty 8.0

    Certain transformer architectures lose internal linear signals for decision quality during training, making observability an architecture-dependent property rather than a universal one.

  3. When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition

    cs.LG 2026-05 unverdicted novelty 7.0

    QAOD projects away question-aligned directions from answer representations to isolate domain-agnostic factuality signals, enabling efficient hallucination detection with top in-domain AUROC and up to 21% better OOD transfer.

  4. KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models

    cs.CV 2026-05 unverdicted novelty 7.0

    KamonBench is a grammar-generated synthetic dataset of compositional kamon crests with explicit factor annotations to evaluate factor recovery in vision-language models.

  5. Deep Minds and Shallow Probes

    cs.LG 2026-05 unverdicted novelty 7.0

    Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.

  6. What Do EEG Foundation Models Capture from Human Brain Signals?

    cs.AI 2026-05 unverdicted novelty 7.0

    EEG foundation models encode 68.6% of a 63-feature clinical lexicon in a representation-causal way, with frequency-domain features dominant; these recover 79.3% of the models' advantage over random baselines on average.

  7. What Do EEG Foundation Models Capture from Human Brain Signals?

    cs.AI 2026-05 unverdicted novelty 7.0

    EEG foundation models encode many traditional hand-crafted features like frequency power, recovering on average 79% of their advantage over random baselines on clinical tasks while leaving residuals on harder ones.

  8. Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations

    cs.AI 2026-05 unverdicted novelty 7.0

    Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.

  9. Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.

  10. When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

    cs.AI 2026-05 unverdicted novelty 7.0

    Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.

  11. Latent Space Probing for Adult Content Detection in Video Generative Models

    cs.CV 2026-04 unverdicted novelty 7.0

    Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.

  12. Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

    cs.LG 2026-05 unverdicted novelty 6.0

    Sparse autoencoders on EEG transformers identify three regimes of clinical concept encoding and reveal entanglements such as age-pathology confounding via a new steering selectivity metric.

  13. Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

    cs.LG 2026-05 unverdicted novelty 6.0

    A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

  14. Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

    cs.LG 2026-05 unverdicted novelty 6.0

    A tabular foundation model with LLM-as-Observer features predicts AI agent decisions in controlled games, outperforming baselines by 4 AUC points and 14% lower error at K=16 interactions.

  15. Instructions Shape Production of Language, not Processing

    cs.CL 2026-05 unverdicted novelty 6.0

    Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

  16. Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces

    cs.LG 2026-05 unverdicted novelty 6.0

    Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.

  17. Conceptors for Semantic Steering

    cs.LG 2026-05 unverdicted novelty 6.0

    Conceptors as soft projection matrices from bipolar activations offer a multidimensional, compositional, and geometrically principled method for semantic steering in LLMs that outperforms single-vector baselines in mu...

  18. Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

    cs.CL 2026-05 unverdicted novelty 6.0

    An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.

  19. Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

    cs.CL 2026-04 unverdicted novelty 6.0

    LLMs encode accurate but brittle internal beliefs about latent game states and convert them poorly into actions, creating systematic gaps that explain strategic failures.

  20. Architecture Determines Observability of Transformers

    cs.LG 2026-04 unverdicted novelty 6.0

    Architecture and training determine whether transformers retain a readable internal signal that lets activation monitors catch errors missed by output confidence.

  21. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

    cs.AI 2023-10 unverdicted novelty 6.0

    At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.

  22. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    cs.CL 2022-11 unverdicted novelty 6.0

    BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

  23. Instructions Shape Production of Language, not Processing

    cs.CL 2026-05 unverdicted novelty 5.0

    Instructions primarily shape the production stage of language models rather than the processing stage, with task-specific information and causal effects stronger in output tokens than input tokens.

  24. Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

    cs.AI 2026-05 unverdicted novelty 5.0

    Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.

  25. Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card

    cs.HC 2026-04 unverdicted novelty 4.0

    The note proposes applying emotion probes to SAE-analyzed strategic concealment episodes to test if emotion vectors capture causal emotions or situational projections in AI models.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 22 Pith papers · 3 internal anchors

  1. [1]

    Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

    Ravichander, Abhilasha and Belinkov, Yonatan and Hovy, Eduard. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021

  2. [2]

    Designing and Interpreting Probes with Control Tasks

    Hewitt, John and Liang, Percy. Designing and Interpreting Probes with Control Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1275

  3. [3]

    Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis

    Zhang, Kelly and Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5448

  4. [4]

    Analyzing analytical methods: The case of phonology in neural models of spoken language

    Chrupa a, Grzegorz and Higy, Bertrand and Alishahi, Afra. Analyzing analytical methods: The case of phonology in neural models of spoken language. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.381

  5. [5]

    Information-Theoretic Probing for Linguistic Structure

    Pimentel, Tiago and Valvoda, Josef and Hall Maudslay, Rowan and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan. Information-Theoretic Probing for Linguistic Structure. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.420

  6. [6]

    Information-Theoretic Probing with Minimum Description Length

    Voita, Elena and Titov, Ivan. Information-Theoretic Probing with Minimum Description Length. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.14

  7. [7]

    P areto Probing: T rading Off Accuracy for Complexity

    Pimentel, Tiago and Saphra, Naomi and Williams, Adina and Cotterell, Ryan. P areto Probing: T rading Off Accuracy for Complexity. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.254

  8. [8]

    Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

    Giulianelli, Mario and Harding, Jack and Mohnert, Florian and Hupkes, Dieuwke and Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5426

  9. [9]

    Transactions of the Association for Computational Linguistics , volume =

    Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Transactions of the Association for Computational Linguistics , volume =. 2021 , keywords =

  10. [10]

    International Conference on Learning Representations , year=

    Identifying and Controlling Important Neurons in Neural Machine Translation , author=. International Conference on Learning Representations , year=

  11. [11]

    Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

    Vig, Jesse and Gehrmann, Sebastian and Belinkov, Yonatan and Qian, Sharon and Nevo, Daniel and Singer, Yaron and Shieber, Stuart , booktitle =. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

  12. [12]

    Computational Linguistics , volume =

    Feder, Amir and Oved, Nadav and Shalit, Uri and Reichart, Roi , title = ". Computational Linguistics , volume =. 2021 , month =. doi:10.1162/coli_a_00404 , url =

  13. [13]

    and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E

    Liu, Nelson F. and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E. and Smith, Noah A. Linguistic Knowledge and Transferability of Contextual Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi...

  14. [14]

    What ' s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation

    K. What ' s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1246

  15. [15]

    Distributional vectors encode referential attributes

    Gupta, Abhijeet and Boleda, Gemma and Baroni, Marco and Pad \'o , Sebastian. Distributional vectors encode referential attributes. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1002

  16. [17]

    Does String-Based Neural MT Learn Source Syntax?

    Shi, Xing and Padhi, Inkit and Knight, Kevin. Does String-Based Neural MT Learn Source Syntax?. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1159

  17. [18]

    Probing for semantic evidence of composition by means of simple classification tasks

    Ettinger, Allyson and Elgohary, Ahmed and Resnik, Philip. Probing for semantic evidence of composition by means of simple classification tasks. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. 2016. doi:10.18653/v1/W16-2524

  18. [19]

    What you can cram into a single \ &!\#* vector:

    Conneau, Alexis and Kruszewski, German and Lample, Guillaume and Barrault, Lo. What you can cram into a single \ & ! \# * vector: Probing sentence embeddings for linguistic properties. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1198

  19. [20]

    International Conference on Learning Representations (ICLR) , year=

    Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , author=. International Conference on Learning Representations (ICLR) , year=

  20. [21]

    Journal of Artificial Intelligence Research , volume=

    Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , author=. Journal of Artificial Intelligence Research , volume=

  21. [22]

    What do Neural Machine Translation Models Learn about Morphology?

    Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James. What do Neural Machine Translation Models Learn about Morphology?. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1080

  22. [23]

    Analysis Methods in Neural Language Processing: A Survey

    Belinkov, Yonatan and Glass, James. Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics. 2019. doi:10.1162/tacl_a_00254

  23. [24]

    An information theoretic view on selecting linguistic probes

    Zhu, Zining and Rudzicz, Frank. An information theoretic view on selecting linguistic probes. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.744

  24. [25]

    Interpretability and Analysis in Neural NLP

    Belinkov, Yonatan and Gehrmann, Sebastian and Pavlick, Ellie. Interpretability and Analysis in Neural NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 2020. doi:10.18653/v1/2020.acl-tutorials.1

  25. [26]

    Interpreting Predictions of NLP Models

    Wallace, Eric and Gardner, Matt and Singh, Sameer. Interpreting Predictions of NLP Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. 2020. doi:10.18653/v1/2020.emnlp-tutorials.3

  26. [27]

    A Survey of the State of Explainable AI for Natural Language Processing

    Danilevsky, Marina and Qian, Kun and Aharonov, Ranit and Katsis, Yannis and Kawas, Ban and Sen, Prithviraj. A Survey of the State of Explainable AI for Natural Language Processing. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language P...

  27. [28]

    2018 , month =

    On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition , author =. 2018 , month =

  28. [29]

    Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

    Belinkov, Yonatan and M \`a rquez, Llu \' s and Sajjad, Hassan and Durrani, Nadir and Dalvi, Fahim and Glass, James. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017

  29. [30]

    Computational Linguistics in the Netherlands Journal , author=

    Investigating `Aspect' in. Computational Linguistics in the Netherlands Journal , author=. 2017 , month=

  30. [31]

    The emergence of number and syntax units in LSTM language models

    Lakretz, Yair and Kruszewski, German and Desbordes, Theo and Hupkes, Dieuwke and Dehaene, Stanislas and Baroni, Marco. The emergence of number and syntax units in LSTM language models. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa...

  31. [32]

    Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pages =

    Pearl, Judea , title =. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pages =. 2001 , isbn =

  32. [33]

    A Structural Probe for Finding Syntax in Word Representations

    Hewitt, John and Manning, Christopher D. A Structural Probe for Finding Syntax in Word Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1419

  33. [34]

    A Tale of a Probe and a Parser

    Hall Maudslay, Rowan and Valvoda, Josef and Pimentel, Tiago and Williams, Adina and Cotterell, Ryan. A Tale of a Probe and a Parser. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.659

  34. [35]

    and Tenney, Ian

    Michael, Julian and Botha, Jan A. and Tenney, Ian. Asking without Telling: Exploring Latent Ontologies in Contextual Representations. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.552

  35. [36]

    On the Linguistic Representational Power of Neural Machine Translation Models

    Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James. On the Linguistic Representational Power of Neural Machine Translation Models. Computational Linguistics. 2020. doi:10.1162/coli_a_00367

  36. [37]

    D e F ormer: Decomposing Pre-trained Transformers for Faster Question Answering

    Cao, Qingqing and Trivedi, Harsh and Balasubramanian, Aruna and Balasubramanian, Niranjan. D e F ormer: Decomposing Pre-trained Transformers for Faster Question Answering. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.411

  37. [38]

    Analyzing Redundancy in Pretrained Transformer Models

    Dalvi, Fahim and Sajjad, Hassan and Durrani, Nadir and Belinkov, Yonatan. Analyzing Redundancy in Pretrained Transformer Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.398

  38. [39]

    Hierarchical Multitask Learning for

    Krishna, Kalpesh and Toshniwal, Shubham and Livescu, Karen , journal=. Hierarchical Multitask Learning for

  39. [40]

    Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , url =

    Belinkov, Yonatan and Glass, James , booktitle =. Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , url =

  40. [41]

    What Does BERT Look at? An Analysis of BERT ' s Attention

    Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D. What Does BERT Look at? An Analysis of BERT ' s Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2019. doi:10.18653/v1/W19-4828

  41. [42]

    Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

    Wu, Zhiyong and Chen, Yun and Kao, Ben and Liu, Qun. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.383

  42. [43]

    On the shortest arborescence of a directed graph

    CHU, Y. On the shortest arborescence of a directed graph. Science Sinica. 1965

  43. [44]

    Journal of Research of the national Bureau of Standards B , volume=

    Optimum branchings , author=. Journal of Research of the national Bureau of Standards B , volume=

  44. [45]

    Lepori, Michael and McCoy, R. Thomas. Picking BERT ' s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.325

  45. [46]

    Correlating Neural and Symbolic Representations of Language

    Chrupa a, Grzegorz and Alishahi, Afra. Correlating Neural and Symbolic Representations of Language. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1283

  46. [47]

    URLhttps://doi.org/10.3389/neuro.06.004.2008

    Kriegeskorte, Nikolaus and Mur, Marieke and Bandettini, Peter , TITLE=. Frontiers in Systems Neuroscience , VOLUME=. 2008 , URL=. doi:10.3389/neuro.06.004.2008 , ISSN=

  47. [48]

    An Analysis of Encoder Representations in Transformer-Based Machine Translation

    Raganato, Alessandro and Tiedemann, J. An Analysis of Encoder Representations in Transformer-Based Machine Translation. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5431

  48. [49]

    From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

    Mare c ek, David and Rosa, Rudolf. From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2019. doi:10.18653/v1/W19-4827

  49. [51]

    Investigating Transferability in Pretrained Language Models

    Tamkin, Alex and Singh, Trisha and Giovanardi, Davide and Goodman, Noah. Investigating Transferability in Pretrained Language Models. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.125

  50. [52]

    Exploring Semantic Properties of Sentence Embeddings

    Zhu, Xunjie and Li, Tingfeng and de Melo, Gerard. Exploring Semantic Properties of Sentence Embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2100

  51. [53]

    Low-Complexity Probing via Finding Subnetworks

    Cao, Steven and Sanh, Victor and Rush, Alexander. Low-Complexity Probing via Finding Subnetworks. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.74

  52. [54]

    D irect P robe: Studying Representations without Classifiers

    Zhou, Yichu and Srikumar, Vivek. D irect P robe: Studying Representations without Classifiers. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.401

  53. [55]

    What if This Modified That? Syntactic Interventions with Counterfactual Embeddings

    Tucker, Mycal and Qian, Peng and Levy, Roger. What if This Modified That? Syntactic Interventions with Counterfactual Embeddings. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.76

  54. [56]

    CoCo@ NIPS , year=

    Diagnostic classifiers revealing how neural networks process hierarchical structure , author=. CoCo@ NIPS , year=

  55. [57]

    CoRR , volume =

    Yossi Adi and Einat Kermany and Yonatan Belinkov and Ofer Lavi and Yoav Goldberg , title =. CoRR , volume =. 2016 , url =

  56. [58]

    A Primer in BERT ology: What We Know About How BERT Works

    Rogers, Anna and Kovaleva, Olga and Rumshisky, Anna. A Primer in BERT ology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00349

  57. [59]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...

  58. [60]

    International Conference on Learning Representations , year=

    What do you learn from context? Probing for sentence structure in contextualized word representations , author=. International Conference on Learning Representations , year=

  59. [61]

    International Conference on Learning Representations , year=

    Predicting Inductive Biases of Pre-Trained Models , author=. International Conference on Learning Representations , year=

  60. [62]

    When Do You Need Billions of Words of Pretraining Data?

    Zhang, Yian and Warstadt, Alex and Li, Xiaocheng and Bowman, Samuel R. When Do You Need Billions of Words of Pretraining Data?. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.90

  61. [63]

    Adi, Yossi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. CoRR, abs/1608.04207

  62. [64]

    Adi, Yossi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In International Conference on Learning Representations (ICLR)

  63. [65]

    Alain, Guillaume and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644v3

  64. [66]

    Bau, Anthony, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2019. Identifying and controlling important neurons in neural machine translation. In International Conference on Learning Representations

  65. [67]

    Belinkov, Yonatan. 2018. On Internal Language Representations in Deep Learning: An Analysis of Machine Translation and Speech Recognition. Ph.D. thesis, Massachusetts Institute of Technology

  66. [68]

    Belinkov, Yonatan, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2017 a . What do neural machine translation models learn about morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 861--872, Association for Computational Linguistics, Vancouver, Canada

  67. [69]

    Belinkov, Yonatan, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2020. On the linguistic representational power of neural machine translation models. Computational Linguistics, 46(1):1--52

  68. [70]

    Belinkov, Yonatan, Sebastian Gehrmann, and Ellie Pavlick. 2020. Interpretability and analysis in neural NLP . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 1--5, Association for Computational Linguistics, Online

  69. [71]

    Belinkov, Yonatan and James Glass. 2017. Analyzing hidden representations in end-to-end automatic speech recognition systems. In Advances in Neural Information Processing Systems, volume 30, pages 2441--2451, Curran Associates, Inc

  70. [72]

    Belinkov, Yonatan and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49--72

  71. [73]

    Belinkov, Yonatan, Llu \' s M \`a rquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2017 b . Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1--10, Asian ...

  72. [74]

    Cao, Qingqing, Harsh Trivedi, Aruna Balasubramanian, and Niranjan Balasubramanian. 2020. D e F ormer: Decomposing pre-trained transformers for faster question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4487--4497, Association for Computational Linguistics, Online

  73. [75]

    Cao, Steven, Victor Sanh, and Alexander Rush. 2021. Low-complexity probing via finding subnetworks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 960--966, Association for Computational Linguistics, Online

  74. [76]

    Chrupa a, Grzegorz and Afra Alishahi. 2019. Correlating neural and symbolic representations of language. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2952--2962, Association for Computational Linguistics, Florence, Italy

  75. [77]

    Chrupa a, Grzegorz, Bertrand Higy, and Afra Alishahi. 2020. Analyzing analytical methods: The case of phonology in neural models of spoken language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4146--4156, Association for Computational Linguistics, Online

  76. [78]

    CHU, Y. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14:1396--1400

  77. [79]

    Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What does BERT look at? an analysis of BERT ' s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276--286, Association for Computational Linguistics, Florence, Italy

  78. [80]

    Conneau, Alexis, German Kruszewski, Guillaume Lample, Lo \" c Barrault, and Marco Baroni. 2018. What you can cram into a single \ & ! \# * vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126--2136, Association for Compu...

  79. [81]

    Dalvi, Fahim, Hassan Sajjad, Nadir Durrani, and Yonatan Belinkov. 2020. Analyzing redundancy in pretrained transformer models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4908--4926, Association for Computational Linguistics, Online

  80. [82]

    Danilevsky, Marina, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processi...

Showing first 80 references.