hub

arXiv preprint arXiv:2512.05117 , year=

The universal weight subspace hypothesis , author= · 2025 · arXiv 2512.05117

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 3 support 1

representative citing papers

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

cs.LG · 2026-06-04 · conditional · novelty 7.0

SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.

Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Neural networks admit large families of approximately equivalent solutions via neuron identifiability even without structural symmetry, enabling linear low-loss merging paths without prior alignment.

Spectral phase transitions and trainability in neural network learning dynamics

cond-mat.dis-nn · 2026-06-26 · unverdicted · novelty 6.0

SGD on neural network weights induces a BBP phase transition that detaches signal eigenvalues from the random bulk, yielding an analytically solvable phase diagram for trainability in a linear teacher-student model.

Black-box model classification under the discriminative factorization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.

ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

ResBM achieves 128x activation compression in pipeline-parallel transformer training by adding a residual bottleneck module that preserves a low-rank identity path, with no major loss in convergence or added overhead.

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference

cs.LG · 2025-12-10 · conditional · novelty 6.0

SHARe-KAN compresses KAN prediction-head storage by 9.3X via post-training vector quantization at a 2-point mAP cost on PASCAL VOC detection, with no retraining and good zero-shot transfer.

Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.

Metaphor Is Not All Attention Needs

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

RLVR exhibits implicit reward overfitting to training data and optimizes heavy-tailed singular spectra with rank-1 focus on reasoning capability.

A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws

cs.LG · 2026-04-27 · unverdicted · novelty 3.0 · 2 refs

Formalizes emergent intelligence in foundation models as the limit of E(N,P,K) as N,P,K approach infinity, proves existence conditions via nonlinear Lipschitz operators, and derives scaling laws from covering numbers.

citing papers explorer

Showing 9 of 9 citing papers after filters.

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability cs.LG · 2026-06-04 · conditional · none · ref 42
SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.
Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability cs.LG · 2026-06-03 · unverdicted · none · ref 115
Neural networks admit large families of approximately equivalent solutions via neuron identifiability even without structural symmetry, enabling linear low-loss merging paths without prior alignment.
Black-box model classification under the discriminative factorization cs.LG · 2026-05-08 · unverdicted · none · ref 20
Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.
ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism cs.LG · 2026-04-13 · unverdicted · none · ref 3
ResBM achieves 128x activation compression in pipeline-parallel transformer training by adding a residual bottleneck module that preserves a low-rank identity path, with no major loss in convergence or added overhead.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment cs.LG · 2026-04-07 · unverdicted · none · ref 32
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference cs.LG · 2025-12-10 · conditional · none · ref 12
SHARe-KAN compresses KAN prediction-head storage by 9.3X via post-training vector quantization at a 2-point mAP cost on PASCAL VOC detection, with no retraining and good zero-shot transfer.
Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered cs.LG · 2026-05-15 · unverdicted · none · ref 83
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.
On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR cs.LG · 2026-05-07 · unverdicted · none · ref 17
RLVR exhibits implicit reward overfitting to training data and optimizes heavy-tailed singular spectra with rank-1 focus on reasoning capability.
A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws cs.LG · 2026-04-27 · unverdicted · none · ref 3 · 2 links
Formalizes emergent intelligence in foundation models as the limit of E(N,P,K) as N,P,K approach infinity, proves existence conditions via nonlinear Lipschitz operators, and derives scaling laws from covering numbers.

arXiv preprint arXiv:2512.05117 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer