EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
Towards seamless interaction: Causal turn-level modeling of inter- active 3d conversational head dynamics
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.
citing papers explorer
-
EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents
EmbodiedHead introduces a Rectified-Flow Diffusion Transformer with differentiable renderer and single-stream listening-speaking conditioning to achieve real-time high-fidelity conversational avatars.
-
3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models
A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.