Advances in Neural Information Processing Systems , volume=

The emergence of essential sparsity in large pre-trained models: The weights that matter , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Pretraining and alignment induce asymmetric geometric traces in transformer weights because alignment updates concentrate in read pathways due to activation covariance while write pathways inherit less structure from alignment losses.

TIDE: Every Layer Knows the Token Beneath the Context

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

citing papers explorer

Showing 2 of 2 citing papers.

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space cs.LG · 2026-05-15 · unverdicted · none · ref 16
Pretraining and alignment induce asymmetric geometric traces in transformer weights because alignment updates concentrate in read pathways due to activation covariance while write pathways inherit less structure from alignment losses.
TIDE: Every Layer Knows the Token Beneath the Context cs.CL · 2026-05-07 · unverdicted · none · ref 117
TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

Advances in Neural Information Processing Systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer