Auto-interpretation labels for SAE features generalize poorly across languages and scripts, missing the same semantic content up to 4x more often in Serbian than English and more in Cyrillic than Latin despite deterministic transliteration.
hub
What Does BERT Learn about the Structure of Language?
16 Pith papers cite this work, alongside 643 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
LLMs show scaling and training-dependent alignment with human brain responses in creativity-related networks during divergent thinking tasks, measured via RSA on fMRI data.
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
LLMs represent semantic relations geometrically via embedding distance and direction; a linear Polar Probe decodes these structures from middle-layer activations and generalizes to new entities.
SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
Entity representations learned from text via link prediction generalize to unseen entities and transfer to classification and retrieval with reported gains of 22% MRR, 16% accuracy, and 8.8% NDCG@10.
Language models show idiom decomposability correlates weakly with human judgments, negatively with syntactic flexibility, and contributes most strongly to representation stabilization during training alongside surprisal and frequency.
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
Different types of syntactic agreement recruit overlapping units within LLMs, indicating that agreement forms a meaningful functional category across English, Russian, Chinese, and structurally similar languages.
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
HydraQE is a new end-to-end speech translation QE system using Qwen3-ASR backbone, sparsemax layer mixing, bidirectional Transformer, and multi-task curriculum training on human and pseudo labels that outperforms cascaded baselines.
Fine-tuning on annotated English and Japanese dialogues improves clustering of backchannels and fillers and makes generated utterances closer to human ones.
citing papers explorer
-
How Far Do Auto-Interpretation Labels Generalize: A Controlled Study Across Languages, Scripts, and Rewordings
Auto-interpretation labels for SAE features generalize poorly across languages and scripts, missing the same semantic content up to 4x more often in Serbian than English and more in Cyrillic than Latin despite deterministic transliteration.
-
Is She Even Relevant? When BERT Ignores Explicit Gender Cues
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
-
Large Language Models Align with the Human Brain during Creative Thinking
LLMs show scaling and training-dependent alignment with human brain responses in creativity-related networks during divergent thinking tasks, measured via RSA on fMRI data.
-
Deep Pre-Alignment for VLMs
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
-
Polar probe linearly decodes semantic structures from LLMs
LLMs represent semantic relations geometrically via embedding distance and direction; a linear Polar Probe decodes these structures from middle-layer activations and generalizes to new entities.
-
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
-
Inductive Entity Representations from Text via Link Prediction
Entity representations learned from text via link prediction generalize to unseen entities and transfer to classification and retrieval with reported gains of 22% MRR, 16% accuracy, and 8.8% NDCG@10.
-
Rethinking the Idiomaticity Decomposability Hypothesis: Evidence from Distributional Learning
Language models show idiom decomposability correlates weakly with human judgments, negatively with syntactic flexibility, and contributes most strongly to representation stabilization during training alongside surprisal and frequency.
-
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
-
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
-
Different types of syntactic agreement recruit the same units within large language models
Different types of syntactic agreement recruit overlapping units within LLMs, indicating that agreement forms a meaningful functional category across English, Russian, Chinese, and structurally similar languages.
-
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
-
HydraQE: OSU's Submission for the IWSLT 2026 Speech Translation Metrics Shared Task
HydraQE is a new end-to-end speech translation QE system using Qwen3-ASR backbone, sparsemax layer mixing, bidirectional Transformer, and multi-task curriculum training on human and pseudo labels that outperforms cascaded baselines.
-
Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models
Fine-tuning on annotated English and Japanese dialogues improves clustering of backchannels and fillers and makes generated utterances closer to human ones.
- Where Does Authorship Signal Emerge in Encoder-Based Language Models?
- MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining