Symmetry in language statistics shapes the geometry of model representations

URL https: //arxiv · 2026 · arXiv 2602.15029

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

Hierarchical concept geometry in embeddings emerges from the spectral properties of word co-occurrence statistics mirroring WordNet hypernym trees.

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

math.OC · 2026-05-12 · conditional · novelty 7.0

Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.

ToxiREX: A Dataset on Toxic REasoning in ConteXt

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A framework quantifies hyperparameter transfer via scaling-law fit quality, extrapolation robustness, and loss penalty, with ablations showing that μP's advantage over standard parameterization stems from maximizing the embedding layer learning rate to avoid bottlenecks and instabilities in AdamW.

Convergent Evolution: How Different Language Models Learn Similar Number Representations

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.

Probing for Representation Manifolds in Superposition

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations

cs.AI · 2026-05-27 · unverdicted · novelty 4.0

Perceptual geometry for color, pitch, emotion and taste emerges transiently in intermediate layers of transformer LLMs despite purely textual training.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

RSD: Moving Local Triangular Charts for Auditing Language-Model Hidden States

cs.CL · 2026-05-17

citing papers explorer

Showing 2 of 2 citing papers after filters.

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate cs.LG · 2026-05-20 · unverdicted · none · ref 26
A framework quantifies hyperparameter transfer via scaling-law fit quality, extrapolation robustness, and loss penalty, with ablations showing that μP's advantage over standard parameterization stems from maximizing the embedding layer learning rate to avoid bottlenecks and instabilities in AdamW.
Probing for Representation Manifolds in Superposition cs.LG · 2026-05-18 · unverdicted · none · ref 104
Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.

Symmetry in language statistics shapes the geometry of model representations

fields

years

verdicts

representative citing papers

citing papers explorer