Reddi, Stefanie Jegelka, and Sanjiv Kumar

Khashayar Gatmiry, Nikunj Saunshi, Sashank J · 2024 · arXiv 2410.08292

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

cs.LG · 2026-05-07 · conditional · novelty 7.0

Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

cs.LG · 2025-02-07 · unverdicted · novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

ELT: Elastic Looped Transformers for Visual Generation

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.

citing papers explorer

Showing 3 of 3 citing papers.

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent cs.LG · 2026-05-07 · conditional · none · ref 16
Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 58
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
ELT: Elastic Looped Transformers for Visual Generation cs.CV · 2026-04-10 · unverdicted · none · ref 20
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.

Reddi, Stefanie Jegelka, and Sanjiv Kumar

fields

years

verdicts

representative citing papers

citing papers explorer