DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.
Megadance: Mixture-of- experts architecture for genre-aware 3d dance generation
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
CustomDancer achieves state-of-the-art text-to-dance retrieval with 10.23% Recall@1 on the new TD-Data dataset by aligning text, music, and motion features through a CLIP-based framework.
PianoFlow generates coordinated bimanual piano motions from audio via MIDI-distilled flow-matching, asymmetric role-gated interaction, and autoregressive streaming continuation, outperforming priors with 9x faster inference.
BiTDiff combines BiMamba-Transformer architecture with diffusion and human-kinematic decomposition to generate high-quality 3D conducting motions from music, achieving SOTA results on the new CM-Data dataset.
TransConductor generates 3D conducting gestures from music via a Trans-Temporal Music Encoder and Gesture Decoder, outperforming baselines on retrieval-based alignment metrics with a new ConductorMotion dataset.
citing papers explorer
-
Interactive Multi-Turn Retrieval for Health Videos
DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.
-
CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval
CustomDancer achieves state-of-the-art text-to-dance retrieval with 10.23% Recall@1 on the new TD-Data dataset by aligning text, music, and motion features through a CLIP-based framework.
-
PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination
PianoFlow generates coordinated bimanual piano motions from audio via MIDI-distilled flow-matching, asymmetric role-gated interaction, and autoregressive streaming continuation, outperforming priors with 9x faster inference.
-
BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion
BiTDiff combines BiMamba-Transformer architecture with diffusion and human-kinematic decomposition to generate high-quality 3D conducting motions from music, achieving SOTA results on the new CM-Data dataset.
-
MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation
TransConductor generates 3D conducting gestures from music via a Trans-Temporal Music Encoder and Gesture Decoder, outperforming baselines on retrieval-based alignment metrics with a new ConductorMotion dataset.