A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.
Title resolution pending
10 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.
BERT embeddings encode narrative dimensions of time, space, causality, and character at the token level, as a linear probe achieves 94% accuracy versus 47% on variance-matched random embeddings, though unsupervised clusters do not align with these categories.
citing papers explorer
-
Accurate and Efficient Statistical Testing for Word Semantic Breadth
A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.
-
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
-
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
-
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.
-
Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
-
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
-
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
-
Analyzing the Effect of Noise in LLM Fine-tuning
Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.
-
Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction
BERT embeddings encode narrative dimensions of time, space, causality, and character at the token level, as a linear probe achieves 94% accuracy versus 47% on variance-matched random embeddings, though unsupervised clusters do not align with these categories.
- Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models