Manifold curvature and intrinsic dimension predict layerwise SAE width exponents and asymptotic floors across Gemma models, with cross-model transfer of the geometric regression, establishing a transferable geometric law instead of a universal scaling law.
Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is background (58%).
citation-role summary
citation-polarity summary
representative citing papers
Steered LLM activations are non-surjective: under practical assumptions, they lie outside the set of states reachable from any discrete prompt.
Proposes a scale-calibrated median-of-means estimator for robust aggregation of distributed PCA estimates on the product of Euclidean space and Grassmann manifold.
Derives the asymptotic distribution of the spatial Cramér-von Mises independence statistic under β-mixing on R² and implements it in Python with eigenvalue-based critical values.
A 10-qubit convolutional quantum graph neural network fed by autoencoder-compressed jet data achieves performance comparable to classical graph networks in distinguishing boosted Z jets from gluon jets.
The test error of random-feature ridge regression with arbitrary data augmentation admits a closed-form asymptotic characterization in the proportional regime that depends only on population covariances and augmentation statistics.
Time-reversed Young interferometry acts as a source-space information processor where mutual information is the reciprocal invariant and source-label entropy can decrease near destructive interference while Fisher information rises.
Three new provably KL-optimal frequency normalization algorithms are presented, one running in linear time in the number of symbols.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
Jensen-Shannon regularized analogues of KL-based direct-correlation measures are introduced, taking values in [0,1] and accompanied by alphabet-size-dependent upper bounds under the observed marginal p(x,z).
Exponentially-shifted Gaussian smoothing yields zeroth-order gradient estimators with linear dimension dependence, enabling improved complexity bounds for stochastic optimization including decision-dependent regimes.
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
The normalized sum of negative log-likelihoods under sublinear parsings converges almost surely and in L1 to the entropy rate h_P for any shift-invariant measure on a finite shift space.
Statistical Linkage Learning enables a new mask construction algorithm for Partition Crossover that maintains effectiveness on noisy problems with hidden dependencies and matches noise-free performance when decomposition quality is high.
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
A meta-learning method identifies the conditional mean of task-specific causal demand parameters by conditioning on all prices while masking two demand outcomes, assuming at least two locally exogenous prices per task.
Inflating the min-norm interpolator by a factor >1 reduces generalization error in linear regression with anisotropic covariances when d/n diverges to infinity.
Conditional normalizing flows approximate intractable likelihoods arising from cell division history to conclude that glc3 is mostly inactive under nutrient stress in yeast, with brief transient expression.
Proves cutoff at entropic time log n/h for reversible mixtures of permuted Markov chains under mild assumptions on the base chains.
Hyper-V2X uses a Bayesian hypernetwork with partial weight generation and V2X context embedding to produce calibrated epistemic and aleatoric uncertainty estimates for multi-agent BEV segmentation on the OPV2V benchmark.
SMA-DP-SGD augments DP-SGD with a spectral memory-aware fractional branch from prior privatized updates to improve accuracy on CIFAR and MNIST while preserving conditional differential privacy.
The paper introduces discipline stability, a trace-based evaluation paradigm for checking if RL agents maintain behavioral discipline like rule-based competitors in hidden-state competitive settings such as hotel pricing and bidding.
Derives time-dependent voltage protocols that eliminate an arbitrary number of relaxation modes to accelerate charging and discharging of planar EDLCs in finite time shorter than intrinsic relaxation timescales.
The Adam-SGD gap in large-batch LLM pre-training arises mainly from SGD's restricted effective learning rates caused by small gradients and output-layer spikes; clipping lets SGD recover nearly all of Adam's performance.
citing papers explorer
-
Farm-wide virtual load monitoring for offshore wind structures via Bayesian neural networks
Bayesian neural networks enable farm-wide virtual load monitoring by predicting structural loads on non-instrumented offshore wind turbines from a fleet-leader's data while quantifying prediction uncertainty.