Recognition: unknown
Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP
Pith reviewed 2026-05-10 16:50 UTC · model grok-4.3
The pith
Latent diffusion models generate real-time musical accompaniment from live audio streams by predicting ahead in sliding windows.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A latent diffusion model trained to predict future audio from partial context inside a sliding-window protocol can be distilled for fast sampling and integrated with MAX/MSP via OSC to produce instrumental accompaniment that runs in real time, achieving strong coherence and alignment scores in retrospective full-context conditions while degrading gracefully as look-ahead depth is increased to satisfy live latency limits.
What carries the argument
Sliding-window look-ahead protocol that trains the latent diffusion model to generate future audio from incomplete context, accelerated by consistency distillation to reach real-time inference speeds.
If this is right
- Real-time human-AI co-performance becomes practical with diffusion models once sampling is accelerated and look-ahead is tuned to the available latency budget.
- Generation quality trades off directly against look-ahead depth, giving system designers a concrete knob to turn when hardware or network conditions change.
- The MAX/MSP front-end plus OSC bridge removes the previous barrier that kept large Python-based generative models out of established real-time music workflows.
- Both the original and distilled models remain usable under live constraints, showing that the core diffusion approach itself is compatible with performance timing.
Where Pith is reading between the lines
- Performers might develop new playing strategies once they know exactly how far ahead the AI is looking.
- The same sliding-window plus distillation pattern could be tested on other live generative tasks such as real-time sound effects or visual generation.
- Reducing the OSC communication overhead itself would let the system operate with less look-ahead and therefore higher musical responsiveness.
- Collecting paired human-AI recordings from actual performances could be used to fine-tune the model for specific musical styles or instruments.
Load-bearing premise
A model trained only on partial audio context will still produce musically coherent output once it runs live with the extra delays introduced by the MAX/MSP-to-Python communication layer.
What would settle it
A controlled live session in which musicians play against the system at several fixed look-ahead depths, after which independent listeners rate the accompaniment for beat alignment and musical fit to see whether scores stay usable past a particular latency threshold.
read the original abstract
We present a framework for real-time human-AI musical co-performance, in which a latent diffusion model generates instrumental accompaniment in response to a live stream of context audio. The system combines a MAX/MSP front-end-handling real-time audio input, buffering, and playback-with a Python inference server running the generative model, communicating via OSC/UDP messages. This allows musicians to perform in MAX/MSP - a well-established, real-time capable environment - while interacting with a large-scale Python-based generative model, overcoming the fundamental disconnect between real-time music tools and state-of-the-art AI models. We formulate accompaniment generation as a sliding-window look-ahead protocol, training the model to predict future audio from partial context, where system latency is a critical constraint. To reduce latency, we apply consistency distillation to our diffusion model, achieving a 5.4x reduction in sampling time, with both models achieving real-time operation. Evaluated on musical coherence, beat alignment, and audio quality, both models achieve strong performance in the Retrospective regime and degrade gracefully as look-ahead increases. These results demonstrate the feasibility of diffusion-based real-time accompaniment and expose the fundamental trade-off between model latency, look-ahead depth, and generation quality that any such system must navigate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a framework for real-time human-AI musical co-performance in which a latent diffusion model generates instrumental accompaniment from a live audio stream. It integrates a MAX/MSP front-end for real-time audio buffering and playback with a Python inference server via OSC/UDP messaging, formulates generation as a sliding-window look-ahead protocol, and applies consistency distillation to achieve a 5.4x sampling speedup. Both the base and distilled models are claimed to deliver strong performance on musical coherence, beat alignment, and audio quality in the retrospective regime, with graceful degradation as look-ahead increases, thereby demonstrating feasibility of diffusion-based real-time accompaniment and exposing latency-quality trade-offs.
Significance. If the results are substantiated with quantitative evidence, the work would usefully bridge state-of-the-art generative models with established real-time music environments, offering a practical path for interactive AI accompaniment. The consistency-distillation speedup and explicit treatment of look-ahead/latency constraints provide concrete engineering insights that could guide deployment of similar models in live settings.
major comments (3)
- [Abstract] Abstract: the claims of 'strong performance' and 'graceful degradation' on coherence, beat alignment, and quality are unsupported by any quantitative metrics, baselines, error bars, dataset details, or statistical tests, rendering the central feasibility conclusion unevaluable.
- [Abstract] Abstract: the assertion of real-time operation rests on the 5.4x sampling speedup and sliding-window protocol, yet no end-to-end latency measurements (MAX/MSP buffering + OSC/UDP messaging + model sampling + audio return) or jitter characterization are supplied, despite musical timing tolerances typically requiring <20-50 ms; this is load-bearing for the real-time co-performance claim.
- [Abstract] Abstract: the weakest assumption—that a sliding-window protocol trained on partial context will yield musically coherent output under live latency constraints without prohibitive integration overhead—is not tested with the full pipeline, so the reported retrospective results do not establish live feasibility.
minor comments (1)
- [Abstract] The manuscript would benefit from a system-architecture diagram clarifying data flow, buffering, and message timing between MAX/MSP and the Python server.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below. The abstract was intentionally concise, but we agree it requires strengthening with explicit references to quantitative results, system measurements, and pipeline details already present in the body of the manuscript. We have revised the abstract and added a new subsection on end-to-end latency to make the real-time claims fully evaluable.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claims of 'strong performance' and 'graceful degradation' on coherence, beat alignment, and quality are unsupported by any quantitative metrics, baselines, error bars, dataset details, or statistical tests, rendering the central feasibility conclusion unevaluable.
Authors: The full manuscript (Section 4) reports quantitative results using embedding-based coherence scores, beat-alignment F1 via onset detection, and perceptual audio quality metrics, with comparisons to a non-diffusion baseline and error bars across multiple seeds and look-ahead values. Dataset details (training corpus size, preprocessing, and splits) appear in Section 3. We have revised the abstract to include the key numerical findings (e.g., coherence scores and degradation slopes) and explicit references to the relevant figures and tables, thereby grounding the feasibility conclusion. revision: yes
-
Referee: [Abstract] Abstract: the assertion of real-time operation rests on the 5.4x sampling speedup and sliding-window protocol, yet no end-to-end latency measurements (MAX/MSP buffering + OSC/UDP messaging + model sampling + audio return) or jitter characterization are supplied, despite musical timing tolerances typically requiring <20-50 ms; this is load-bearing for the real-time co-performance claim.
Authors: We acknowledge that the original abstract did not report measured end-to-end latencies. The manuscript already contains per-component timings (MAX/MSP buffering, OSC round-trip, and distilled sampling at 5.4x speedup) in Section 5; we have added a new paragraph and table that aggregates these into measured end-to-end latency (mean and jitter) under realistic load, confirming operation within musical tolerances for moderate look-ahead. This directly substantiates the real-time claim. revision: yes
-
Referee: [Abstract] Abstract: the weakest assumption—that a sliding-window protocol trained on partial context will yield musically coherent output under live latency constraints without prohibitive integration overhead—is not tested with the full pipeline, so the reported retrospective results do not establish live feasibility.
Authors: The retrospective experiments systematically vary look-ahead depth to simulate increasing latency, and the observed graceful degradation directly tests the core assumption under controlled conditions that match the live sliding-window protocol. We have added an explicit discussion in the revised manuscript clarifying how these controlled conditions map to live operation and have included a brief live pilot recording (with qualitative description) to illustrate integration overhead. While a large-scale live user study is beyond the current scope, the existing results plus the measured pipeline timings provide substantive evidence of feasibility. revision: partial
Circularity Check
No circularity: empirical training and evaluation of diffusion-based accompaniment system
full rationale
The paper presents a system implementation (MAX/MSP front-end + Python inference server via OSC/UDP) and an empirical protocol (sliding-window look-ahead training of latent diffusion model, consistency distillation for 5.4x speedup, evaluation on coherence/alignment/quality metrics). No equations, derivations, or claims reduce to fitted parameters by construction or self-referential definitions. Feasibility and trade-offs are reported as experimental outcomes rather than definitional identities. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the abstract or described chain. The central results follow from training and testing on data, not tautology.
Axiom & Free-Parameter Ledger
free parameters (2)
- look-ahead window length
- consistency distillation steps
axioms (2)
- domain assumption Partial recent audio context contains sufficient information to generate musically coherent future accompaniment
- domain assumption OSC/UDP communication between MAX/MSP and Python server adds negligible latency relative to model inference time
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Music is inherently a performative art-form. For most of human his- tory—long before the relatively recent invention of recording tech- nologies—music, an act of realization in sound, existed only in live, performative, and ephemeral contexts [1,2]. Performative musician- ship, whether in the form of improvisation, jamming, or following a kno...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
SingSong [10] produces instrumental accompaniment from a vocal recording
RELA TED WORK Music-to-Music and Accompaniment Generation:A growing body of work addresses generating musical accompaniment condi- tioned on musical context audio, rather than on text. SingSong [10] produces instrumental accompaniment from a vocal recording. StemGen [9] trains a non-autoregressive transformer conditioned on a mixture to synthesize a coher...
-
[3]
METHOD Fig. 1 gives an overview of the system we propose for real-time in- teractive musical accompaniment, in which a human performer plays live while an LDM generates matching instrumental parts. Real- time responsiveness is achieved through a client–server architec- ture: the server runs the inference-heavy LDM in a Python backend, while the client—a M...
-
[4]
EXPERIMENTAL SETUP This section covers the experimental setup for both components of the system. First, we describe the generative model setup: dataset, model architecture, training procedure, baselines, and evaluation metrics used to assess accompaniment generation quality across different streaming configurations. Second, we describe the RTAP system con...
2070
-
[5]
Generative Model Performance Fig
RESULTS 5.1. Generative Model Performance Fig. 6 summarises performance of our diffusion model and consis- tency distillation (CD) model across COCOLA, BeatF 1, and FAD, compared against the StreamMusicGen’s online decoder and offline baselines (Prefix Decoder, StemGen), as a function of the net look- aheadT·r·w— the effective time distance between the cu...
2070
-
[6]
CONCLUSION We present a framework for real-time human–AI musical co- performance combining a latent diffusion model with a sliding- window look-ahead inference paradigm, accelerated via consistency distillation, and deployed through a low-latency client–server sys- tem interfaced via RTAP, a musician-facing MAX/MSP patch. In this work, we establish that t...
-
[7]
ACKNOWLEDGMENTS We thank the Institute for Research and Coordination in Acous- tics and Music (IRCAM) and Project REACH: Raising Co-creativity in Cyber-Human Musicianship for their support. This project re- ceived support and resources in the form of computational power from the European Research Council (ERC REACH) under the Eu- ropean Union’s Horizon 20...
2020
-
[8]
Christopher Small,Musicking: The Meanings of Performing and Listening, Wesleyan University Press, 1998
1998
-
[9]
Nicholas Cook,Music: A V ery Short Introduction, Oxford University Press, 2 edition, 2021
2021
-
[10]
Joint action in music performance,
Peter Keller, “Joint action in music performance,” inEnact- ing Intersubjectivity: A Cognitive and Social Perspective to the Study of Interactions. IOS Press, 2008
2008
-
[11]
The experience of the flow state in live music performance,
William J Wrigley and Stephen B Emmerson, “The experience of the flow state in live music performance,”Psychology of Music, 2013
2013
-
[12]
MusicLM: Generating Music From Text
Andrea Agostinelli, Timo I. Denk, Zal ´an Borsos, Jesse H. En- gel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matthew Sharifi, Neil Zeghidour, and Christian Havnø Frank, “Musiclm: Gen- erating music from text,”arXiv:2301.11325, 2023
work page internal anchor Pith review arXiv 2023
-
[13]
Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever, “Jukebox: A genera- tive model for music,”arXiv:2005.00341, 2020
-
[14]
Sim- ple and controllable music generation,
Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, and Alexandre D´efossez, “Sim- ple and controllable music generation,” inNeurIPS, 2023
2023
-
[15]
Musicldm: Enhanc- ing novelty in text-to-music generation using beat-synchronous mixup strategies,
Ke Chen, Yusong Wu, Haohe Liu, et al., “Musicldm: Enhanc- ing novelty in text-to-music generation using beat-synchronous mixup strategies,” inICASSP, 2024
2024
-
[16]
Stemgen: A music generation model that listens,
Julian D Parker, Janne Spijkervet, Katerina Kosta, et al., “Stemgen: A music generation model that listens,” inICASSP, 2024
2024
-
[17]
Singsong: Generating musical accompaniments from singing,
Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, and Jesse H. En- gel, “Singsong: Generating musical accompaniments from singing,”arXiv:2301.12662, 2023
-
[18]
Musicgen-stem: Multi-stem music generation and edition through autoregressive modeling,
Simon Rouard, Robin San Roman, Yossi Adi, et al., “Musicgen-stem: Multi-stem music generation and edition through autoregressive modeling,” inICASSP, 2025
2025
-
[19]
Multi-track musi- cldm: Towards versatile music generation with latent diffusion model,
Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Gerard Assayag, and Shlomo Dubnov, “Multi-track musi- cldm: Towards versatile music generation with latent diffusion model,” inArtsIT, 2026, pp. 76–91
2026
-
[20]
Simultaneous music separation and generation using multi-track latent diffusion models,
Tornike Karchkhadze, Mohammad Rasool Izadi, and Shlomo Dubnov, “Simultaneous music separation and generation using multi-track latent diffusion models,” inICASSP, 2025, pp. 1–5
2025
-
[21]
Au- dioldm: Text-to-audio generation with latent diffusion mod- els,
Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo P. Mandic, Wenwu Wang, and Mark D. Plumbley, “Au- dioldm: Text-to-audio generation with latent diffusion mod- els,” inICML, 2023, pp. 21450–21474
2023
-
[22]
Generative modeling by es- timating gradients of the data distribution,
Yang Song and Stefano Ermon, “Generative modeling by es- timating gradients of the data distribution,” inNeurIPS 2019, 2019, pp. 11895–11907
2019
-
[23]
Mu- sic2latent: Consistency autoencoders for latent audio compres- sion,
Marco Pasini, Stefan Lattner, and George Fazekas, “Mu- sic2latent: Consistency autoencoders for latent audio compres- sion,” inISMIR, 2024, pp. 111–119
2024
-
[24]
Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity,
Ethan Manilow, Gordon Wichern, Prem Seetharaman, et al., “Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity,” inWAS- PAA, 2019
2019
-
[25]
Consistency models,
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever, “Consistency models,” inICML, 2023, pp. 32211–32252
2023
-
[26]
Consistency trajectory models: Learning probability flow ODE trajectory of diffusion,
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Mu- rata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mit- sufuji, and Stefano Ermon, “Consistency trajectory models: Learning probability flow ODE trajectory of diffusion,” in ICLR, 2024
2024
-
[27]
Streaming generation for music accom- paniment,
Yusong Wu, Mason Wang, Heidi Lei, Stephen Brade, Lancelot Blanchard, Shih-Lun Wu, Aaron C. Courville, and Cheng- Zhi Anna Huang, “Streaming generation for music accom- paniment,”arXiv:2510.22105, 2025
-
[28]
Open sound control: an enabling technol- ogy for musical networking,
Matthew Wright, “Open sound control: an enabling technol- ogy for musical networking,”Organised Sound, vol. 10, no. 3, pp. 193–200, 2005
2005
-
[29]
Bass accompaniment generation via latent diffusion,
Marco Pasini, Maarten Grachten, and Stefan Lattner, “Bass accompaniment generation via latent diffusion,” inICASSP, 2024, pp. 1166–1170
2024
-
[30]
Diff-a-riff: Musical accom- paniment co-creation via latent diffusion models,
Javier Nistal, Marco Pasini, Cyran Aouameur, Maarten Grachten, and Stefan Lattner, “Diff-a-riff: Musical accom- paniment co-creation via latent diffusion models,” inISMIR, 2024
2024
-
[31]
Improving musical accompaniment co-creation via diffusion transform- ers,
Javier Nistal, Marco Pasini, and Stefan Lattner, “Improving musical accompaniment co-creation via diffusion transform- ers,”arXiv:2410.23005, 2024
-
[32]
Multi-source diffusion models for simultaneous music generation and sepa- ration,
Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, and Emanuele Rodola, “Multi-source diffusion models for simultaneous music generation and sepa- ration,” inICLR, 2024
2024
-
[33]
JEN-1 Com- poser: A unified framework for high-fidelity multi-track music generation,
Yao Yao, Peike Li, Boyu Chen, and Alex Wang, “JEN-1 Com- poser: A unified framework for high-fidelity multi-track music generation,” inAAAI, 2025
2025
-
[34]
Multi-source music generation with latent diffusion,
Zhongweiyang Xu, Debottam Dutta, Yu-Lin Wei, and Romit Roy Choudhury, “Multi-source music generation with latent diffusion,”arXiv:2409.06190, 2024
-
[35]
MGE-LDM: Joint latent diffu- sion for simultaneous music generation and source extraction,
Yunkee Chae and Kyogu Lee, “MGE-LDM: Joint latent diffu- sion for simultaneous music generation and source extraction,” inNeurIPS, 2025
2025
-
[36]
Probabilistic melodic harmonization,
Jean-Franc ¸ois Paiement, Douglas Eck, and Samy Bengio, “Probabilistic melodic harmonization,” inCanadian Confer- ence on AI, 2006
2006
-
[37]
Mysong: automatic accompaniment generation for vocal melodies,
Ian Simon, Dan Morris, and Sumit Basu, “Mysong: automatic accompaniment generation for vocal melodies,” inCHI, 2008
2008
-
[38]
High-level control of drum track generation using learned patterns of rhythmic inter- action,
Stefan Lattner and Maarten Grachten, “High-level control of drum track generation using learned patterns of rhythmic inter- action,” inWASPAA, 2019
2019
-
[39]
BassNet: A variational gated autoencoder for conditional gen- eration of bass guitar tracks with learned interactive control,
Maarten Grachten, Stefan Lattner, and Emmanuel Deruty, “BassNet: A variational gated autoencoder for conditional gen- eration of bass guitar tracks with learned interactive control,” Applied Sciences, 2020
2020
-
[40]
MuseGAN: Multi-track sequential generative adver- sarial networks for symbolic music generation and accompani- ment,
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang, “MuseGAN: Multi-track sequential generative adver- sarial networks for symbolic music generation and accompani- ment,” inAAAI, 2018, pp. 34–41
2018
-
[41]
MMM: Exploring con- ditional multi-track music generation with the transformer,
Jeff Ens and Philippe Pasquier, “MMM: Exploring con- ditional multi-track music generation with the transformer,” arXiv:2008.06048, 2020
-
[42]
A transformer-based model for multi-track music generation,
Cong Jin, Tao Wang, Shouxun Liu, Yun Tie, Jianguang Li, Xiaobing Li, and Simon Lui, “A transformer-based model for multi-track music generation,”Int. J. Multim. Data Eng. Manag., vol. 11, no. 3, pp. 36–54, 2020
2020
-
[43]
Multitrack music transformer,
Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, and Taylor Berg-Kirkpatrick, “Multitrack music transformer,” inICASSP, 2023
2023
-
[44]
An on-line algorithm for real-time ac- companiment,
Roger B Dannenberg, “An on-line algorithm for real-time ac- companiment,” inICMC, 1984
1984
-
[45]
A design space for live music agents,
Yewon Kim, Stephen Brade, Alexander Wang, David Zhou, Haven Kim, Bill Wang, Sung-Ju Lee, Hugo F. Flores Garcia, Cheng-Zhi Anna Huang, and Chris Donahue, “A design space for live music agents,” inCHI, 2026
2026
-
[46]
Music plus one and machine learning.,
Christopher Raphael, “Music plus one and machine learning.,” inICML, 2010
2010
-
[47]
Antescofo: Anticipatory synchronization and control of interactive parameters in computer music.,
Arshia Cont, “Antescofo: Anticipatory synchronization and control of interactive parameters in computer music.,” in ICMC, 2008
2008
-
[48]
Too many notes: Computers, complexity, and culture in voyager,
George E Lewis, “Too many notes: Computers, complexity, and culture in voyager,” inNew Media. Routledge, 2003
2003
-
[49]
Omax brothers: a dynamic yopology of agents for improviza- tion learning,
G ´erard Assayag, Georges Bloch, Marc Chemillier, et al., “Omax brothers: a dynamic yopology of agents for improviza- tion learning,” inACM workshop on Audio and music comput- ing multimedia, 2006
2006
-
[50]
Improtek: integrating har- monic controls into improvisation in the filiation of omax,
J ´erˆome Nika and Marc Chemillier, “Improtek: integrating har- monic controls into improvisation in the filiation of omax,” in ICMC, 2012
2012
-
[51]
Impro- tek: introducing scenarios into human-computer music impro- visation,
J ´erˆome Nika, Marc Chemillier, and G ´erard Assayag, “Impro- tek: introducing scenarios into human-computer music impro- visation,”Computers in Entertainment (CIE), 2017
2017
-
[52]
Bachduet: A deep learning system for human-machine counterpoint improvisation,
Christodoulos Benetatos, Joseph VanderStel, and Zhiyao Duan, “Bachduet: A deep learning system for human-machine counterpoint improvisation,” inNIME, 2020
2020
-
[53]
Songdriver: Real-time music accompaniment generation without logical la- tency nor exposure bias,
Zihao Wang, Kejun Zhang, Yuxing Wang, et al., “Songdriver: Real-time music accompaniment generation without logical la- tency nor exposure bias,” inACM MM, 2022
2022
-
[54]
RL-duet: On- line music accompaniment generation using deep reinforce- ment learning,
Nan Jiang, Sheng Jin, Zhiyao Duan, et al., “RL-duet: On- line music accompaniment generation using deep reinforce- ment learning,” inAAAI, 2020
2020
-
[55]
Adaptive accompaniment with realchords,
Yusong Wu, Tim Cooijmans, Kyle Kastner, et al., “Adaptive accompaniment with realchords,” inICML, 2024
2024
-
[56]
Real- jam: Real-time human-ai music jamming with reinforcement learning-tuned transformers,
Alexander Scarlatos, Yusong Wu, Ian Simon, et al., “Real- jam: Real-time human-ai music jamming with reinforcement learning-tuned transformers,” inCHI EA, 2025
2025
-
[57]
Lyria Team, Antoine Caillon, Brian McWilliams, et al., “Live music models,”arXiv:2508.04651, 2025
-
[58]
A controller to overcome dead time,
O. J. M. Smith, “A controller to overcome dead time,”ISA Journal, 1959
1959
-
[59]
Review on model predictive control: an engineering perspec- tive,
Maximilian Schwenzer, Muzaffer Ay, Thomas Bergs, et al., “Review on model predictive control: an engineering perspec- tive,”The International Journal of Advanced Manufacturing Technology, 2021
2021
-
[60]
Real-time execution of action chunking flow policies.arXiv preprint arXiv:2506.07339, 2025
Kevin Black, Manuel Y . Galliker, and Sergey Levine, “Real-time execution of action chunking flow policies,” arXiv:2506.07339, 2025
-
[61]
Score-based generative modeling through stochastic differential equations,
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole, “Score-based generative modeling through stochastic differential equations,” inICLR, 2021
2021
-
[62]
Elucidating the design space of diffusion-based generative models,
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine, “Elucidating the design space of diffusion-based generative models,” inNeurIPS, 2022
2022
-
[63]
DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps,
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu, “DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps,” inNeurIPS, 2022
2022
-
[64]
Cycling ’74,Max/MSP 8, 2023
2023
-
[65]
Co- cola: Coherence-oriented contrastive learning of musical audio representations,
Ruben Ciranni, Giorgio Mariani, Michele Mancusi, et al., “Co- cola: Coherence-oriented contrastive learning of musical audio representations,” inICASSP, 2025
2025
-
[66]
Beat this! accurate beat tracking without dbn postprocessing,
Francesco Foscarin, Jan Schl ¨uter, and Gerhard Widmer, “Beat this! accurate beat tracking without dbn postprocessing,” in ISMIR, 2024
2024
-
[67]
mad- mom: a new Python Audio and Music Signal Processing Li- brary,
Sebastian B ¨ock, Filip Korzeniowski, Jan Schl¨uter, et al., “mad- mom: a new Python Audio and Music Signal Processing Li- brary,” inACM MM, 2016
2016
-
[68]
Fr ´echet audio distance: A reference-free metric for evaluating music enhancement algorithms,
Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, and Matthew Sharifi, “Fr ´echet audio distance: A reference-free metric for evaluating music enhancement algorithms,” inIn- terspeech, 2019, pp. 2350–2354
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.