DanceDuo: Bridging Human Movement and AI Choreography

Gia-Cat Bui-Le; Hai-Dang Nguyen; Trung-Nghia Le; Tuong-Vy Truong-Thuy

arxiv: 2606.26507 · v1 · pith:W65ZW733new · submitted 2026-06-25 · 💻 cs.HC · cs.CV

DanceDuo: Bridging Human Movement and AI Choreography

Gia-Cat Bui-Le , Tuong-Vy Truong-Thuy , Hai-Dang Nguyen , Trung-Nghia Le This is my paper

Pith reviewed 2026-06-26 04:15 UTC · model grok-4.3

classification 💻 cs.HC cs.CV

keywords dance generationdiffusion modelshuman pose estimationAI choreographymusic synchronizationuser interfaceuser studyperformance comparison

0 comments

The pith

DanceDuo generates AI-choreographed dance sequences from music via diffusion models and compares them to user performances using pose estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DanceDuo as a platform that applies diffusion models to create dance sequences matched to different music genres. Users can choose tracks and models, upload their own dance videos, and view side-by-side comparisons generated by human pose estimation. The design targets practice encouragement through direct interaction between personal movement and AI output. A user study found the interface intuitive and singled out the comparison feature for praise. The work positions this setup as a bridge between human movement and AI for both casual and professional dance applications.

Core claim

DanceDuo is a platform that leverages diffusion models to generate AI-choreographed dance sequences synchronized with a variety of music genres, integrates human pose estimation models to provide users with insightful comparisons of their own performances with AI-generated sequences, and demonstrates through a user study that the interface is intuitive with particular praise for the dance comparison feature.

What carries the argument

The DanceDuo platform, which combines diffusion-based generation of music-synchronized dances with human pose estimation for direct user-AI movement comparisons.

If this is right

Users gain a tool to practice dancing by directly comparing their movements to AI-generated sequences.
The system supports varied experiences through choices of music tracks, humanoid models, and personal video uploads.
Human pose estimation supplies the mechanism for side-by-side performance feedback.
Positive user study results on intuitiveness and comparison value support broader recreational and professional use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the comparison loop proves effective, the same generation-plus-estimation pattern could extend to other movement-based skills such as sports drills.
Real-time variants of the platform might shorten the gap between generation and feedback for live practice sessions.
The approach could serve as a template for applying generative models to other creative physical domains beyond dance.

Load-bearing premise

The diffusion-generated dance sequences are synchronized and high-quality enough to support meaningful performance comparisons that encourage practice.

What would settle it

A controlled test in which users rate the AI sequences as poorly matched to the music or show no measurable change in their own dance accuracy after repeated comparisons.

Figures

Figures reproduced from arXiv: 2606.26507 by Gia-Cat Bui-Le, Hai-Dang Nguyen, Trung-Nghia Le, Tuong-Vy Truong-Thuy.

**Figure 2.** Figure 2: Human pose and shape representation in rotation-based. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Flowchart illustrating the user’s process through the application [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: UI where user choose music and humanoid model to showcase the dance. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: User can watch the dance performance again along side with their dance [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

In recent years, advancements in deep learning and generative models have revolutionized music-driven dance generation. This paper introduces a novel platform, namely DanceDuo, leveraging diffusion models to generate AI-choreographed dance sequences synchronized with a variety of music genres, to encourage dancing practice. The system allows users to interact with AI by selecting music tracks, humanoid models, and importing personal dance videos for comparison, fostering a rich and engaging user experience. DanceDuo not only offers dance generation but also integrates human pose estimation models to provide users with insightful comparisons of their own performances with AI-generated sequences. We conducted a comprehensive user study, revealing that users found the interface intuitive, with particular praise for the dance comparison feature. Our DanceDuo contributes significantly to the integration of AI in dance choreography, offering novel avenues for both recreational and professional applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DanceDuo wires existing diffusion dance generation and pose estimation into a UI but gives no numbers on synchronization or quality to support the comparison claims.

read the letter

DanceDuo combines music-driven dance generation via diffusion models with human pose estimation for user comparison in one platform. The main new element is the integrated system that lets users pick tracks, choose models, generate sequences, and upload their own videos for side-by-side feedback, plus a user study on the interface.

The work does a reasonable job describing a practical flow for dance practice and reports that participants found the UI intuitive with specific praise for the comparison feature. That feedback is concrete and points to one part of the design that lands with users.

The soft spot is the missing evidence on the generative side. The paper states that the diffusion outputs are synchronized across music genres and high-quality enough for meaningful comparisons, but it supplies no beat-alignment numbers, no quality metrics, no baselines against earlier music-to-dance models, and no ratings of the generated dances themselves. The user study only measures interface reactions, not whether the AI sequences actually help practice or produce useful comparisons. That leaves the central assumption untested.

This is the kind of paper that might interest HCI readers who build creative tools and want an example of how generation and feedback can be packaged together. It does not offer new technical results or strong empirical grounding, so most readers outside that narrow applied niche would not get much from it.

I would not send it to peer review without the quantitative checks on the generated dances and some baseline comparisons.

Referee Report

2 major / 1 minor

Summary. The paper introduces DanceDuo, a platform leveraging diffusion models to generate AI-choreographed dance sequences synchronized with music genres, integrating human pose estimation for user performance comparisons, and reporting a user study on interface usability with praise for the comparison feature.

Significance. If the diffusion-generated dances prove sufficiently synchronized and high-quality, the system could provide a useful tool for dance practice and AI-choreography integration. The user study offers limited evidence of usability, but the absence of quantitative validation for the generative component limits assessment of broader impact.

major comments (2)

[Abstract] Abstract: the central claims that diffusion models produce synchronized sequences enabling 'insightful comparisons' and that the system encourages dancing practice are unsupported, as no quantitative metrics (beat-alignment error, distribution metrics such as FID, or baseline comparisons to prior music-to-dance models) are supplied.
[User Study] User study description: the study measures only interface intuitiveness and praise for the comparison UI, without testing whether the generated dances themselves are adequate for meaningful comparisons, leaving the core assumption about generative quality untested.

minor comments (1)

The manuscript would benefit from explicit details on the diffusion model architecture, training procedure, and pose estimation pipeline to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the need for quantitative support of the generative claims. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims that diffusion models produce synchronized sequences enabling 'insightful comparisons' and that the system encourages dancing practice are unsupported, as no quantitative metrics (beat-alignment error, distribution metrics such as FID, or baseline comparisons to prior music-to-dance models) are supplied.

Authors: We agree the abstract makes claims about synchronization and comparison value without supporting quantitative evidence such as beat-alignment error, FID, or baselines. The manuscript centers on platform integration and usability feedback rather than generative model benchmarking. We will revise the abstract to remove these unsupported claims and clarify the scope as a user-facing system. revision: yes
Referee: [User Study] User study description: the study measures only interface intuitiveness and praise for the comparison UI, without testing whether the generated dances themselves are adequate for meaningful comparisons, leaving the core assumption about generative quality untested.

Authors: The user study was limited to interface usability and perceived value of the comparison feature. It did not assess objective quality of the generated dances or their suitability for meaningful comparisons. We will add an explicit limitations paragraph noting this gap and that generative adequacy remains untested. revision: yes

Circularity Check

0 steps flagged

No circularity: system description paper contains no derivations, equations, or fitted predictions

full rationale

The paper is a description of a user-facing platform (DanceDuo) that integrates existing diffusion models for music-to-dance generation and pose-estimation tools for comparison. The abstract and provided text contain no equations, no parameter-fitting steps, no 'predictions' derived from fitted inputs, and no load-bearing self-citations or uniqueness theorems. The central claims concern system features and a user study on interface usability; these do not reduce to any self-referential construction. No derivation chain exists to inspect, so the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities; none are extractable from the provided text.

pith-pipeline@v0.9.1-grok · 5684 in / 1158 out tokens · 39777 ms · 2026-06-26T04:15:08.211890+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 13 canonical work pages · 5 internal anchors

[1]

Abouaf, J.: "biped": a dance with virtual and company dancers. 1. IEEE Multi- Media6(3), 4–7 (1999)

1999
[2]

In: Pro- ceedings of the 19th ACM international conference on Multimedia

Alexiadis,D.S.,Kelly,P.,Daras,P.,O’Connor,N.E.,Boubekeur,T.,Moussa,M.B.: Evaluating a dancer’s performance using kinect-based skeleton tracking. In: Pro- ceedings of the 19th ACM international conference on Multimedia. pp. 659–662 (2011)

2011
[3]

Journal on Computing and Cultural Heritage (JOCCH)8(4), 1–19 (2015)

Aristidou, A., Stavrakis, E., Charalambous, P., Chrysanthou, Y., Himona, S.L.: Folk dance evaluation using laban movement analysis. Journal on Computing and Cultural Heritage (JOCCH)8(4), 1–19 (2015)

2015
[4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Benzine, A., Chabot, F., Luvison, B., Pham, Q.C., Achard, C.: Pandanet: Anchor-based single-shot multi-person 3d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6856– 6865 (2020)

2020
[5]

arXiv preprint arXiv:2207.08089 (2022)

Blau, T., Ganz, R., Kawar, B., Bronstein, A., Elad, M.: Threat model-agnostic adversarial defense using diffusion models. arXiv preprint arXiv:2207.08089 (2022)

work page arXiv 2022
[6]

In: Proceedings of the Seventh International Conference on Computational Creativity

Brockhoeft, T., Petuch, J., Bach, J., Djerekarov, E., Ackerman, M., Tyson, G.: In- teractive augmented reality for dance. In: Proceedings of the Seventh International Conference on Computational Creativity. pp. 396–403 (2016)

2016
[7]

Advances in neural information processing systems33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020)

1901
[8]

Dance notations and robot motion pp

Burton, S.J., Samadani, A.A., Gorbet, R., Kulić, D.: Laban movement analysis and affective movement generation for robots and other near-living creatures. Dance notations and robot motion pp. 25–48 (2016)

2016
[9]

IEEE transactions on learning technolo- gies4(2), 187–195 (2010)

Chan, J.C., Leung, H., Tang, J.K., Komura, T.: A virtual reality dance training system using motion capture technology. IEEE transactions on learning technolo- gies4(2), 187–195 (2010)

2010
[10]

ACM Transactions on Graphics (TOG)40(4), 1–13 (2021)

Chen, K., Tan, Z., Lei, J., Zhang, S.H., Guo, Y.C., Zhang, W., Hu, S.M.: Chore- omaster: choreography-oriented music-driven dance synthesis. ACM Transactions on Graphics (TOG)40(4), 1–13 (2021)

2021
[11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Dabral, R., Mughal, M.H., Golyanik, V., Theobalt, C.: Mofusion: A framework for denoising-diffusion-based motion synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9760–9770 (2023)

2023
[12]

IEEE transactions on visualization and computer graphics 18(3), 501–515 (2011)

Fan, R., Xu, S., Geng, W.: Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics 18(3), 501–515 (2011)

2011
[13]

Imagen Video: High Definition Video Generation with Diffusion Models

Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D.P., Poole, B., Norouzi, M., Fleet, D.J., et al.: Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[14]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020
[15]

Advances in Neural Information Processing Systems35, 8633– 8646 (2022)

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. Advances in Neural Information Processing Systems35, 8633– 8646 (2022)

2022
[16]

ACM Transactions on Graphics (TOG)35(4), 1–11 (2016)

Holden, D., Saito, J., Komura, T.: A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics (TOG)35(4), 1–11 (2016)

2016
[17]

arXiv preprint arXiv:2006.06119 (2020) DanceDuo: Bridging Human Movement and AI Choreography 13

Huang, R., Hu, H., Wu, W., Sawada, K., Zhang, M., Jiang, D.: Dance revolution: Long-term dance generation with music via curriculum learning. arXiv preprint arXiv:2006.06119 (2020) DanceDuo: Bridging Human Movement and AI Choreography 13

work page arXiv 2006
[18]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7122–7131 (2018)

2018
[19]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

2019
[20]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Kim, J., Kim, J., Choi, S.: Flame: Free-form language-based motion synthesis & editing. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 8255–8263 (2023)

2023
[21]

In: Universal Access in Human- Computer Interaction

Kitsikidis, A., Dimitropoulos, K., Yilmaz, E., Douka, S., Grammalidis, N.: Multi- sensor technology and fuzzy logic for dancer’s motion analysis and performance evaluation within a 3d virtual environment. In: Universal Access in Human- Computer Interaction. Design and Development Methods for Universal Access: 8th International Conference, UAHCI 2014, Held...

2014
[22]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2009
[23]

IEEE Access10, 44982–45000 (2022)

Kritsis, K., Gkiokas, A., Pikrakis, A., Katsouros, V.: Danceconv: Dance motion generation with convolutional networks. IEEE Access10, 44982–45000 (2022)

2022
[24]

ACM Transactions on Intelligent Systems and Technology (TIST)6(2), 1–37 (2015)

Kyan, M., Sun, G., Li, H., Zhong, L., Muneesawang, P., Dong, N., Elder, B., Guan, L.: An approach to ballet dance training through ms kinect and visualization in a cave virtual reality environment. ACM Transactions on Intelligent Systems and Technology (TIST)6(2), 1–37 (2015)

2015
[25]

Advances in neural information processing systems32(2019)

Lee, H.Y., Yang, X., Liu, M.Y., Wang, T.C., Lu, Y.D., Yang, M.H., Kautz, J.: Dancing to music. Advances in neural information processing systems32(2019)

2019
[26]

In: Proceedings of the AAAI Con- ference on Artificial Intelligence

Li, B., Zhao, Y., Zhelun, S., Sheng, L.: Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In: Proceedings of the AAAI Con- ference on Artificial Intelligence. vol. 36, pp. 1272–1279 (2022)

2022
[27]

Neurocomputing 479, 47–59 (2022)

Li, H., Yang, Y., Chang, M., Chen, S., Feng, H., Xu, Z., Li, Q., Chen, Y.: Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59 (2022)

2022
[28]

arXiv preprint arXiv:2008.08171 (2020)

Li, J., Yin, Y., Chu, H., Zhou, Y., Wang, T., Fidler, S., Li, H.: Learning to generate diverse dance motions with transformer. arXiv preprint arXiv:2008.08171 (2020)

work page arXiv 2008
[29]

arXiv preprint arXiv:2403.10518 (2024)

Li, R., Zhang, Y., Zhang, Y., Zhang, H., Guo, J., Zhang, Y., Liu, Y., Li, X.: Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives. arXiv preprint arXiv:2403.10518 (2024)

work page arXiv 2024
[30]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, R., Yang, S., Ross, D.A., Kanazawa, A.: Ai choreographer: Music conditioned 3d dance generation with aist++. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13401–13412 (2021)

2021
[31]

Advances in Neural Information Processing Systems35, 4328–4343 (2022)

Li, X., Thickstun, J., Gulrajani, I., Liang, P.S., Hashimoto, T.B.: Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems35, 4328–4343 (2022)

2022
[32]

In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 851–866 (2023)

2023
[33]

In: 2018 International Conference on 3D Vision (3DV)

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C.: Single-shot multi-person 3d pose estimation from monocular rgb. In: 2018 International Conference on 3D Vision (3DV). pp. 120–130. IEEE (2018) 14 G.-C. Bui-Le et al

2018
[34]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single im- age. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10975–10985 (2019)

2019
[35]

In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: Supervising human mesh estimation with texture consistency. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 803–812 (2019)

2019
[36]

DreamFusion: Text-to-3D using 2D Diffusion

Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[37]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text- conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2), 3 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

In: Proceedings of the IEEE/CVF international conference on computer vision

Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: Hu- mor: 3d human motion model for robust pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11488–11499 (2021)

2021
[39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

2022
[40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22500–22510 (2023)

2023
[41]

In: ACM SIGGRAPH 2022 confer- ence proceedings

Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., Fleet, D., Norouzi, M.: Palette: Image-to-image diffusion models. In: ACM SIGGRAPH 2022 confer- ence proceedings. pp. 1–10 (2022)

2022
[42]

In: Proceedings of the 16th ACM international conference on Multimedia

Sheppard, R.M., Kamali, M., Rivas, R., Tamai, M., Yang, Z., Wu, W., Nahrstedt, K.: Advancing interactive collaborative mediums through tele-immersive dance (ted) a symbiotic creativity and design environment for art and computer science. In: Proceedings of the 16th ACM international conference on Multimedia. pp. 579– 588 (2008)

2008
[43]

In: Computer Graphics Forum

Shiratori, T., Nakazawa, A., Ikeuchi, K.: Dancing-to-music character animation. In: Computer Graphics Forum. vol. 25, pp. 449–458. Wiley Online Library (2006)

2006
[44]

Make-A-Video: Text-to-Video Generation without Text-Video Data

Singer, U., Polyak, A., Hayes, T., Yin, X., An, J., Zhang, S., Hu, Q., Yang, H., Ashual, O., Gafni, O., et al.: Make-a-video: Text-to-video generation without text- video data. arXiv preprint arXiv:2209.14792 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[45]

In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Siyao, L., Yu, W., Gu, T., Lin, C., Wang, Q., Qian, C., Loy, C.C., Liu, Z.: Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11050–11059 (2022)

2022
[46]

IEEE Transactions on Multimedia23, 497–509 (2020)

Sun, G., Wong, Y., Cheng, Z., Kankanhalli, M.S., Geng, W., Li, X.: Deepdance: music-to-dance motion choreography with adversarial learning. IEEE Transactions on Multimedia23, 497–509 (2020)

2020
[47]

Advances in Neural Information Processing Systems35, 9995–10007 (2022)

Sun, J., Wang, C., Hu, H., Lai, H., Jin, Z., Hu, J.F.: You never stop dancing: Non- freezing dance generation via bank-constrained manifold projection. Advances in Neural Information Processing Systems35, 9995–10007 (2022)

2022
[48]

In: Proceedings of the IEEE/CVF international conference on computer vision

Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3d people. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11179–11188 (2021)

2021
[49]

In: Proceedings of the 26th ACM international conference on Multimedia

Tang, T., Jia, J., Mao, H.: Dance with melody: An lstm-autoencoder approach to music-oriented dance synthesis. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 1598–1606 (2018) DanceDuo: Bridging Human Movement and AI Choreography 15

2018
[50]

Trajkova,M.,Cafaro,F.:E-ballet:designingforremoteballetlearning.In:Proceed- ings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. pp. 213–216 (2016)

2016
[51]

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies2(1), 1–30 (2018)

Trajkova, M., Cafaro, F.: Takes tutu to ballet: designing visual and verbal feedback for augmented mirrors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies2(1), 1–30 (2018)

2018
[52]

In: Proceedings of the 20th Pan-Hellenic Conference on Informatics

Tsampounaris, G., El Raheb, K., Katifori, V., Ioannidis, Y.: Exploring visualiza- tions in real-time motion capture for dance education. In: Proceedings of the 20th Pan-Hellenic Conference on Informatics. pp. 1–6 (2016)

2016
[53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Tseng, J., Castellon, R., Liu, K.: Edge: Editable dance generation from music. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 448–458 (2023)

2023
[54]

In: Proceedings of the Asian Conference on Computer Vision (2024)

Tuong-Vy, T.T., Gia-Cat, B.L., Hai-Dang, N., Trung-Nghia, L.: Rethinking sam- pling for music-driven long-term dance generation. In: Proceedings of the Asian Conference on Computer Vision (2024)

2024
[55]

Advances in neural information processing systems30(2017)

Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems30(2017)

2017
[56]

In: Forty- first International Conference on Machine Learning (2024)

Yang, L., Yu, Z., Meng, C., Xu, M., Ermon, S., Bin, C.: Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms. In: Forty- first International Conference on Machine Learning (2024)

2024
[57]

arXiv preprint arXiv:2405.14785 (2024)

Yang, L., Zeng, B., Liu, J., Li, H., Xu, M., Zhang, W., Yan, S.: Editworld: Sim- ulating world dynamics for instruction-following image editing. arXiv preprint arXiv:2405.14785 (2024)

work page arXiv 2024
[58]

Entropy25(10), 1469 (2023)

Yang, R., Srivastava, P., Mandt, S.: Diffusion probabilistic modeling for video generation. Entropy25(10), 1469 (2023)

2023
[59]

arXiv preprint arXiv:2308.11945 (2023)

Yang, S., Yang, Z., Wang, Z.: Longdancediff: Long-term dance generation with conditional diffusion model. arXiv preprint arXiv:2308.11945 (2023)

work page arXiv 2023
[60]

In: International Conference on Machine Learning

Yoon, J., Hwang, S.J., Lee, J.: Adversarial purification with score-based genera- tive models. In: International Conference on Machine Learning. pp. 12062–12072. PMLR (2021)

2021
[61]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yuan, Y., Iqbal, U., Molchanov, P., Kitani, K., Kautz, J.: Glamr: Global occlusion- aware human mesh recovery with dynamic cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11038– 11049 (2022)

2022
[62]

arXiv preprint arXiv:2310.05375 (2023)

Zeng, B., Li, S., Feng, Y., Li, H., Gao, S., Liu, J., Li, H., Tang, X., Liu, J., Zhang, B.: Ipdreamer: Appearance-controllable 3d object generation with image prompts. arXiv preprint arXiv:2310.05375 (2023)

work page arXiv 2023
[63]

arXiv preprint arXiv:2403.06741 (2024)

Zhu, H., Yang, L., Yong, J.H., Zhang, W., Wang, B.: Distribution-aware data expansion with diffusion models. arXiv preprint arXiv:2403.06741 (2024)

work page arXiv 2024
[64]

Zhuang, H., Lei, S., Xiao, L., Li, W., Chen, L., Yang, S., Wu, Z., Kang, S., Meng, H.: Gtn-bailando: Genre consistent long-term 3d dance generation based on pre- trainedgenretokennetwork.In:ICASSP2023-2023IEEEInternationalConference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)

2023

[1] [1]

Abouaf, J.: "biped": a dance with virtual and company dancers. 1. IEEE Multi- Media6(3), 4–7 (1999)

1999

[2] [2]

In: Pro- ceedings of the 19th ACM international conference on Multimedia

Alexiadis,D.S.,Kelly,P.,Daras,P.,O’Connor,N.E.,Boubekeur,T.,Moussa,M.B.: Evaluating a dancer’s performance using kinect-based skeleton tracking. In: Pro- ceedings of the 19th ACM international conference on Multimedia. pp. 659–662 (2011)

2011

[3] [3]

Journal on Computing and Cultural Heritage (JOCCH)8(4), 1–19 (2015)

Aristidou, A., Stavrakis, E., Charalambous, P., Chrysanthou, Y., Himona, S.L.: Folk dance evaluation using laban movement analysis. Journal on Computing and Cultural Heritage (JOCCH)8(4), 1–19 (2015)

2015

[4] [4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Benzine, A., Chabot, F., Luvison, B., Pham, Q.C., Achard, C.: Pandanet: Anchor-based single-shot multi-person 3d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6856– 6865 (2020)

2020

[5] [5]

arXiv preprint arXiv:2207.08089 (2022)

Blau, T., Ganz, R., Kawar, B., Bronstein, A., Elad, M.: Threat model-agnostic adversarial defense using diffusion models. arXiv preprint arXiv:2207.08089 (2022)

work page arXiv 2022

[6] [6]

In: Proceedings of the Seventh International Conference on Computational Creativity

Brockhoeft, T., Petuch, J., Bach, J., Djerekarov, E., Ackerman, M., Tyson, G.: In- teractive augmented reality for dance. In: Proceedings of the Seventh International Conference on Computational Creativity. pp. 396–403 (2016)

2016

[7] [7]

Advances in neural information processing systems33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020)

1901

[8] [8]

Dance notations and robot motion pp

Burton, S.J., Samadani, A.A., Gorbet, R., Kulić, D.: Laban movement analysis and affective movement generation for robots and other near-living creatures. Dance notations and robot motion pp. 25–48 (2016)

2016

[9] [9]

IEEE transactions on learning technolo- gies4(2), 187–195 (2010)

Chan, J.C., Leung, H., Tang, J.K., Komura, T.: A virtual reality dance training system using motion capture technology. IEEE transactions on learning technolo- gies4(2), 187–195 (2010)

2010

[10] [10]

ACM Transactions on Graphics (TOG)40(4), 1–13 (2021)

Chen, K., Tan, Z., Lei, J., Zhang, S.H., Guo, Y.C., Zhang, W., Hu, S.M.: Chore- omaster: choreography-oriented music-driven dance synthesis. ACM Transactions on Graphics (TOG)40(4), 1–13 (2021)

2021

[11] [11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Dabral, R., Mughal, M.H., Golyanik, V., Theobalt, C.: Mofusion: A framework for denoising-diffusion-based motion synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9760–9770 (2023)

2023

[12] [12]

IEEE transactions on visualization and computer graphics 18(3), 501–515 (2011)

Fan, R., Xu, S., Geng, W.: Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics 18(3), 501–515 (2011)

2011

[13] [13]

Imagen Video: High Definition Video Generation with Diffusion Models

Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D.P., Poole, B., Norouzi, M., Fleet, D.J., et al.: Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[14] [14]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020

[15] [15]

Advances in Neural Information Processing Systems35, 8633– 8646 (2022)

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. Advances in Neural Information Processing Systems35, 8633– 8646 (2022)

2022

[16] [16]

ACM Transactions on Graphics (TOG)35(4), 1–11 (2016)

Holden, D., Saito, J., Komura, T.: A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics (TOG)35(4), 1–11 (2016)

2016

[17] [17]

arXiv preprint arXiv:2006.06119 (2020) DanceDuo: Bridging Human Movement and AI Choreography 13

Huang, R., Hu, H., Wu, W., Sawada, K., Zhang, M., Jiang, D.: Dance revolution: Long-term dance generation with music via curriculum learning. arXiv preprint arXiv:2006.06119 (2020) DanceDuo: Bridging Human Movement and AI Choreography 13

work page arXiv 2006

[18] [18]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7122–7131 (2018)

2018

[19] [19]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

2019

[20] [20]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Kim, J., Kim, J., Choi, S.: Flame: Free-form language-based motion synthesis & editing. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 8255–8263 (2023)

2023

[21] [21]

In: Universal Access in Human- Computer Interaction

Kitsikidis, A., Dimitropoulos, K., Yilmaz, E., Douka, S., Grammalidis, N.: Multi- sensor technology and fuzzy logic for dancer’s motion analysis and performance evaluation within a 3d virtual environment. In: Universal Access in Human- Computer Interaction. Design and Development Methods for Universal Access: 8th International Conference, UAHCI 2014, Held...

2014

[22] [22]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2009

[23] [23]

IEEE Access10, 44982–45000 (2022)

Kritsis, K., Gkiokas, A., Pikrakis, A., Katsouros, V.: Danceconv: Dance motion generation with convolutional networks. IEEE Access10, 44982–45000 (2022)

2022

[24] [24]

ACM Transactions on Intelligent Systems and Technology (TIST)6(2), 1–37 (2015)

Kyan, M., Sun, G., Li, H., Zhong, L., Muneesawang, P., Dong, N., Elder, B., Guan, L.: An approach to ballet dance training through ms kinect and visualization in a cave virtual reality environment. ACM Transactions on Intelligent Systems and Technology (TIST)6(2), 1–37 (2015)

2015

[25] [25]

Advances in neural information processing systems32(2019)

Lee, H.Y., Yang, X., Liu, M.Y., Wang, T.C., Lu, Y.D., Yang, M.H., Kautz, J.: Dancing to music. Advances in neural information processing systems32(2019)

2019

[26] [26]

In: Proceedings of the AAAI Con- ference on Artificial Intelligence

Li, B., Zhao, Y., Zhelun, S., Sheng, L.: Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In: Proceedings of the AAAI Con- ference on Artificial Intelligence. vol. 36, pp. 1272–1279 (2022)

2022

[27] [27]

Neurocomputing 479, 47–59 (2022)

Li, H., Yang, Y., Chang, M., Chen, S., Feng, H., Xu, Z., Li, Q., Chen, Y.: Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59 (2022)

2022

[28] [28]

arXiv preprint arXiv:2008.08171 (2020)

Li, J., Yin, Y., Chu, H., Zhou, Y., Wang, T., Fidler, S., Li, H.: Learning to generate diverse dance motions with transformer. arXiv preprint arXiv:2008.08171 (2020)

work page arXiv 2008

[29] [29]

arXiv preprint arXiv:2403.10518 (2024)

Li, R., Zhang, Y., Zhang, Y., Zhang, H., Guo, J., Zhang, Y., Liu, Y., Li, X.: Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives. arXiv preprint arXiv:2403.10518 (2024)

work page arXiv 2024

[30] [30]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, R., Yang, S., Ross, D.A., Kanazawa, A.: Ai choreographer: Music conditioned 3d dance generation with aist++. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13401–13412 (2021)

2021

[31] [31]

Advances in Neural Information Processing Systems35, 4328–4343 (2022)

Li, X., Thickstun, J., Gulrajani, I., Liang, P.S., Hashimoto, T.B.: Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems35, 4328–4343 (2022)

2022

[32] [32]

In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 851–866 (2023)

2023

[33] [33]

In: 2018 International Conference on 3D Vision (3DV)

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C.: Single-shot multi-person 3d pose estimation from monocular rgb. In: 2018 International Conference on 3D Vision (3DV). pp. 120–130. IEEE (2018) 14 G.-C. Bui-Le et al

2018

[34] [34]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single im- age. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10975–10985 (2019)

2019

[35] [35]

In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: Supervising human mesh estimation with texture consistency. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 803–812 (2019)

2019

[36] [36]

DreamFusion: Text-to-3D using 2D Diffusion

Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[37] [37]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text- conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2), 3 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [38]

In: Proceedings of the IEEE/CVF international conference on computer vision

Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: Hu- mor: 3d human motion model for robust pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11488–11499 (2021)

2021

[39] [39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

2022

[40] [40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22500–22510 (2023)

2023

[41] [41]

In: ACM SIGGRAPH 2022 confer- ence proceedings

Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., Fleet, D., Norouzi, M.: Palette: Image-to-image diffusion models. In: ACM SIGGRAPH 2022 confer- ence proceedings. pp. 1–10 (2022)

2022

[42] [42]

In: Proceedings of the 16th ACM international conference on Multimedia

Sheppard, R.M., Kamali, M., Rivas, R., Tamai, M., Yang, Z., Wu, W., Nahrstedt, K.: Advancing interactive collaborative mediums through tele-immersive dance (ted) a symbiotic creativity and design environment for art and computer science. In: Proceedings of the 16th ACM international conference on Multimedia. pp. 579– 588 (2008)

2008

[43] [43]

In: Computer Graphics Forum

Shiratori, T., Nakazawa, A., Ikeuchi, K.: Dancing-to-music character animation. In: Computer Graphics Forum. vol. 25, pp. 449–458. Wiley Online Library (2006)

2006

[44] [44]

Make-A-Video: Text-to-Video Generation without Text-Video Data

Singer, U., Polyak, A., Hayes, T., Yin, X., An, J., Zhang, S., Hu, Q., Yang, H., Ashual, O., Gafni, O., et al.: Make-a-video: Text-to-video generation without text- video data. arXiv preprint arXiv:2209.14792 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[45] [45]

In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Siyao, L., Yu, W., Gu, T., Lin, C., Wang, Q., Qian, C., Loy, C.C., Liu, Z.: Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11050–11059 (2022)

2022

[46] [46]

IEEE Transactions on Multimedia23, 497–509 (2020)

Sun, G., Wong, Y., Cheng, Z., Kankanhalli, M.S., Geng, W., Li, X.: Deepdance: music-to-dance motion choreography with adversarial learning. IEEE Transactions on Multimedia23, 497–509 (2020)

2020

[47] [47]

Advances in Neural Information Processing Systems35, 9995–10007 (2022)

Sun, J., Wang, C., Hu, H., Lai, H., Jin, Z., Hu, J.F.: You never stop dancing: Non- freezing dance generation via bank-constrained manifold projection. Advances in Neural Information Processing Systems35, 9995–10007 (2022)

2022

[48] [48]

In: Proceedings of the IEEE/CVF international conference on computer vision

Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3d people. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11179–11188 (2021)

2021

[49] [49]

In: Proceedings of the 26th ACM international conference on Multimedia

Tang, T., Jia, J., Mao, H.: Dance with melody: An lstm-autoencoder approach to music-oriented dance synthesis. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 1598–1606 (2018) DanceDuo: Bridging Human Movement and AI Choreography 15

2018

[50] [50]

Trajkova,M.,Cafaro,F.:E-ballet:designingforremoteballetlearning.In:Proceed- ings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. pp. 213–216 (2016)

2016

[51] [51]

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies2(1), 1–30 (2018)

Trajkova, M., Cafaro, F.: Takes tutu to ballet: designing visual and verbal feedback for augmented mirrors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies2(1), 1–30 (2018)

2018

[52] [52]

In: Proceedings of the 20th Pan-Hellenic Conference on Informatics

Tsampounaris, G., El Raheb, K., Katifori, V., Ioannidis, Y.: Exploring visualiza- tions in real-time motion capture for dance education. In: Proceedings of the 20th Pan-Hellenic Conference on Informatics. pp. 1–6 (2016)

2016

[53] [53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Tseng, J., Castellon, R., Liu, K.: Edge: Editable dance generation from music. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 448–458 (2023)

2023

[54] [54]

In: Proceedings of the Asian Conference on Computer Vision (2024)

Tuong-Vy, T.T., Gia-Cat, B.L., Hai-Dang, N., Trung-Nghia, L.: Rethinking sam- pling for music-driven long-term dance generation. In: Proceedings of the Asian Conference on Computer Vision (2024)

2024

[55] [55]

Advances in neural information processing systems30(2017)

Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems30(2017)

2017

[56] [56]

In: Forty- first International Conference on Machine Learning (2024)

Yang, L., Yu, Z., Meng, C., Xu, M., Ermon, S., Bin, C.: Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms. In: Forty- first International Conference on Machine Learning (2024)

2024

[57] [57]

arXiv preprint arXiv:2405.14785 (2024)

Yang, L., Zeng, B., Liu, J., Li, H., Xu, M., Zhang, W., Yan, S.: Editworld: Sim- ulating world dynamics for instruction-following image editing. arXiv preprint arXiv:2405.14785 (2024)

work page arXiv 2024

[58] [58]

Entropy25(10), 1469 (2023)

Yang, R., Srivastava, P., Mandt, S.: Diffusion probabilistic modeling for video generation. Entropy25(10), 1469 (2023)

2023

[59] [59]

arXiv preprint arXiv:2308.11945 (2023)

Yang, S., Yang, Z., Wang, Z.: Longdancediff: Long-term dance generation with conditional diffusion model. arXiv preprint arXiv:2308.11945 (2023)

work page arXiv 2023

[60] [60]

In: International Conference on Machine Learning

Yoon, J., Hwang, S.J., Lee, J.: Adversarial purification with score-based genera- tive models. In: International Conference on Machine Learning. pp. 12062–12072. PMLR (2021)

2021

[61] [61]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yuan, Y., Iqbal, U., Molchanov, P., Kitani, K., Kautz, J.: Glamr: Global occlusion- aware human mesh recovery with dynamic cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11038– 11049 (2022)

2022

[62] [62]

arXiv preprint arXiv:2310.05375 (2023)

Zeng, B., Li, S., Feng, Y., Li, H., Gao, S., Liu, J., Li, H., Tang, X., Liu, J., Zhang, B.: Ipdreamer: Appearance-controllable 3d object generation with image prompts. arXiv preprint arXiv:2310.05375 (2023)

work page arXiv 2023

[63] [63]

arXiv preprint arXiv:2403.06741 (2024)

Zhu, H., Yang, L., Yong, J.H., Zhang, W., Wang, B.: Distribution-aware data expansion with diffusion models. arXiv preprint arXiv:2403.06741 (2024)

work page arXiv 2024

[64] [64]

Zhuang, H., Lei, S., Xiao, L., Li, W., Chen, L., Yang, S., Wu, Z., Kang, S., Meng, H.: Gtn-bailando: Genre consistent long-term 3d dance generation based on pre- trainedgenretokennetwork.In:ICASSP2023-2023IEEEInternationalConference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)

2023