Manifold curvature and intrinsic dimension predict layerwise SAE width exponents and asymptotic floors across Gemma models, with cross-model transfer of the geometric regression, establishing a transferable geometric law instead of a universal scaling law.
Title resolution pending
24 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3 cs.CL 2 cs.IT 2 math.ST 2 quant-ph 2 stat.ML 2 cond-mat.stat-mech 1 cs.AI 1 cs.CC 1 cs.DC 1years
2026 24roles
background 1polarities
background 1representative citing papers
Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.
The test error of random-feature ridge regression with arbitrary data augmentation admits a closed-form asymptotic characterization in the proportional regime that depends only on population covariances and augmentation statistics.
Time-reversed Young interferometry acts as a source-space information processor where mutual information is the reciprocal invariant and source-label entropy can decrease near destructive interference while Fisher information rises.
Three new provably KL-optimal frequency normalization algorithms are presented, one running in linear time in the number of symbols.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
Jensen-Shannon regularized analogues of KL-based direct-correlation measures are introduced, taking values in [0,1] and accompanied by alphabet-size-dependent upper bounds under the observed marginal p(x,z).
Exponentially-shifted Gaussian smoothing yields zeroth-order gradient estimators with linear dimension dependence, enabling improved complexity bounds for stochastic optimization including decision-dependent regimes.
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
The normalized sum of negative log-likelihoods under sublinear parsings converges almost surely and in L1 to the entropy rate h_P for any shift-invariant measure on a finite shift space.
Statistical Linkage Learning enables a new mask construction algorithm for Partition Crossover that maintains effectiveness on noisy problems with hidden dependencies and matches noise-free performance when decomposition quality is high.
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
Joint location-scale minimization for geometric medians on product manifolds degenerates to marginal medians, and three new scale-selection methods restore identifiability with asymptotic guarantees.
Non-affine approval functions create unavoidable miscalibration in proper scoring rules for strategic agents, but step-function thresholds enable first-best screening without it, uniquely for the Brier score.
Tsallis q-exponential distributions arise by minimizing a free energy built from a self-consistency entropy defined via a nonlinear operator Omega, with q = alpha + beta obtained directly from the operator's fixed-point structure.
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.
A reformulation of Bayesian OED as dense matrix subset selection plus a pipelined Schur-complement greedy algorithm on hundreds of GPUs enables optimization of 175-sensor networks for billion-degree-of-freedom tsunami models with near-perfect scaling.
Niching importance sampling yields a robust probability-of-failure estimator that avoids degeneracy on multi-modal performance functions by integrating evolutionary niching with importance sampling.
QuantumXCT learns parameterized quantum circuits to model interaction-induced unitary transformations between non-interacting and interacting cellular state distributions from transcriptomic profiles.
Quantum f-divergence equals classical f-divergence of Nussbaum-Szkoła distributions for normal states on semifinite von Neumann algebras.
Resonance statistics-informed methods in automated fitting reduce spin group bias, enhance Wigner statistics consistency, and stabilize resonance density with minimal impact on cross section fit quality.
LLMs generate 5P causal graphs from 46 psychotherapy intake transcripts that match human expert graphs in structure and meaning, with moderate clinical usefulness ratings.
Orbit gaps prevent exact classification of tractable problems by closure-invariant structural predicates on the full binary pairwise domain, blocking universal exact-certification characterizations.
citing papers explorer
-
The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws
Manifold curvature and intrinsic dimension predict layerwise SAE width exponents and asymptotic floors across Gemma models, with cross-model transfer of the geometric regression, establishing a transferable geometric law instead of a universal scaling law.
-
Steered LLM Activations are Non-Surjective
Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.
-
Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation
The test error of random-feature ridge regression with arbitrary data augmentation admits a closed-form asymptotic characterization in the proportional regime that depends only on population covariances and augmentation statistics.
-
Entropic Reciprocity in Time-Reversed Young Interferometry
Time-reversed Young interferometry acts as a source-space information processor where mutual information is the reciprocal invariant and source-label entropy can decrease near destructive interference while Fisher information rises.
-
Fast and Exact: Asymptotically Linear KL-Optimal Frequency Normalization
Three new provably KL-optimal frequency normalization algorithms are presented, one running in linear time in the number of symbols.
-
Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
-
How to quantify direct correlations between variables
Jensen-Shannon regularized analogues of KL-based direct-correlation measures are introduced, taking values in [0,1] and accompanied by alphabet-size-dependent upper bounds under the observed marginal p(x,z).
-
Complexity Guarantees for Zeroth-order Methods via Exponentially-shifted Gaussian Smoothing: Mitigating Dimension-dependence and Incorporating Decision-dependence
Exponentially-shifted Gaussian smoothing yields zeroth-order gradient estimators with linear dimension dependence, enabling improved complexity bounds for stochastic optimization including decision-dependent regimes.
-
Reinforcement Learning via Value Gradient Flow
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
-
Stability of the Shannon--McMillan--Breiman Theorem under Sublinear Parsings
The normalized sum of negative log-likelihoods under sublinear parsings converges almost surely and in L1 to the entropy rate h_P for any shift-invariant measure on a finite shift space.
-
Obtaining Partition Crossover masks using Statistical Linkage Learning for solving noised optimization problems with hidden variable dependency structure
Statistical Linkage Learning enables a new mask construction algorithm for Partition Crossover that maintains effectiveness on noisy problems with hidden dependencies and matches noise-free performance when decomposition quality is high.
-
Many-Tier Instruction Hierarchy in LLM Agents
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
-
Scale selection for geometric medians on product manifolds
Joint location-scale minimization for geometric medians on product manifolds degenerates to marginal medians, and three new scale-selection methods restore identifiability with asymptotic guarantees.
-
The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting
Non-affine approval functions create unavoidable miscalibration in proper scoring rules for strategic agents, but step-function thresholds enable first-best screening without it, uniquely for the Brier score.
-
Emergence of Tsallis Statistics from a Self-Referential Nonlinear Operator: A Variational Framework
Tsallis q-exponential distributions arise by minimizing a free energy built from a self-consistency entropy defined via a nonlinear operator Omega, with q = alpha + beta obtained directly from the operator's fixed-point structure.
-
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
-
Particle transformers for identifying Lorentz-boosted Higgs bosons decaying to a pair of W bosons
PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.
-
Sensor Placement for Tsunami Early Warning via Large-Scale Bayesian Optimal Experimental Design
A reformulation of Bayesian OED as dense matrix subset selection plus a pipelined Schur-complement greedy algorithm on hundreds of GPUs enables optimization of 175-sensor networks for billion-degree-of-freedom tsunami models with near-perfect scaling.
-
Niching Importance Sampling for Multi-modal Rare-event Simulation
Niching importance sampling yields a robust probability-of-failure estimator that avoids degeneracy on multi-modal performance functions by integrating evolutionary niching with importance sampling.
-
QuantumXCT: Learning Interaction-Induced State Transformation in Cell-Cell Communication via Quantum Entanglement and Generative Modeling
QuantumXCT learns parameterized quantum circuits to model interaction-induced unitary transformations between non-interacting and interacting cellular state distributions from transcriptomic profiles.
-
Quantum $f$-divergences via Nussbaum-Szko{\l}a Distributions in Semifinite von Neumann Algebras
Quantum f-divergence equals classical f-divergence of Nussbaum-Szkoła distributions for normal states on semifinite von Neumann algebras.
-
Resonance Statistics -Informed Fitting Applied to Automated Cross Section Evaluation
Resonance statistics-informed methods in automated fitting reduce spin group bias, enhance Wigner statistics consistency, and stabilize resonance density with minimal impact on cross section fit quality.
-
InsightFlow: LLM-Driven Synthesis of Patient Narratives for Mental Health into Causal Models
LLMs generate 5P causal graphs from 46 psychotherapy intake transcripts that match human expert graphs in structure and meaning, with moderate clinical usefulness ratings.
-
Exact Structural Abstraction and Tractability Limits
Orbit gaps prevent exact classification of tractable problems by closure-invariant structural predicates on the full binary pairwise domain, blocking universal exact-certification characterizations.