Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.
Tracing representation progression: Analyzing and enhancing layer-wise similarity
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
citing papers explorer
-
A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.
-
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
-
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.
-
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
- Rethinking Cross-Layer Information Routing in Diffusion Transformers
- Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale