Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
In: The Thirty- ninth Annual Conference on Neural Information Processing Systems (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
citing papers explorer
-
Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity
Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
-
EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.