pith. sign in

arxiv: 2606.09780 · v1 · pith:7TE7UZGBnew · submitted 2026-06-08 · 💻 cs.SD · cs.NE

Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration

Pith reviewed 2026-06-27 14:58 UTC · model grok-4.3

classification 💻 cs.SD cs.NE
keywords quality-diversitysound synthesisMAP-ElitesCPPNDSP graphsevolutionary algorithmsaudio generationinnovation engine
0
0 comments X

The pith

MAP-Elites with CPPNs, DSP graphs and a classifier produces diverse innovative synthetic sounds across durations and contexts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Composers face difficulty manually exploring large spaces of possible sounds. The paper applies quality-diversity search to automate that exploration by maintaining an archive of high-performing solutions in different behavioral niches. It combines MAP-Elites with CPPN-based sound synthesis and a supervised deep learning model that supplies the quality signal. A variant uses separate CPPNs for different frequency ranges to reduce network size while preserving output variety. The system also tracks how lineages switch between musical and non-musical goals and how solutions specialize when sound duration is added to the behavior space.

Core claim

The authors establish that CPPN and DSP graphs coupled with MAP-Elites and a deep learning classifier generate a substantial variety of synthetic sounds that are diverse and innovative across temporal and contextual dimensions.

What carries the argument

MAP-Elites algorithm that fills a multi-dimensional archive of phenotypic elites, with behavior dimensions that include sound duration and musical versus non-musical context, and quality supplied by the classifier.

If this is right

  • Solutions specialize in separate temporal niches when sound duration is included in the behavior space.
  • Lineages reach musical sounds by traversing non-musical stepping stones.
  • Multiple specialized CPPNs achieve performance comparable to single larger networks.
  • The generated sounds can be used directly in composition experiments across varied durations and contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Composers could treat the resulting archive as a source of starting material rather than building every sound from scratch.
  • The observed goal-switching paths might suggest initialization strategies that accelerate discovery in other generative domains.
  • Further expansion of the behavior space could expose additional niches that current single-context searches miss.

Load-bearing premise

The supervised classifier supplies a quality signal that reliably matches human notions of musical usefulness or innovation.

What would settle it

If side-by-side listening tests show that the sounds archived by the QD system are rated no more diverse or innovative than sounds produced by the same synthesis methods without the archive or classifier, the central claim would be falsified.

read the original abstract

This study addresses the challenges composers and sound designers face in creating and refining tools to achieve their musical goals. Using evolutionary processes to promote diversity and foster serendipitous discoveries, we automate the search through uncharted sonic spaces for sound discovery, arguing that diversity-promoting algorithms can bridge the gap between the theoretical realisation and practical accessibility of sounds. We describe a system for generative sound synthesis combining Quality Diversity (QD) algorithms with a supervised discriminative model, inspired by the Innovation Engine algorithm, and explore different configurations and the interplay between the chosen synthesis approach and the discriminative model. We examine the interaction between Compositional Pattern Producing Networks (CPPNs) and Digital Signal Processing (DSP) graphs, introducing a novel approach that uses multiple specialised CPPNs for different frequency ranges; this yields simpler networks while maintaining performance comparable to single-CPPN setups. We also investigate evolutionary stepping stones by analysing goal switches between musical and non-musical contexts, revealing how lineages traverse unlikely paths to current elites. Expanding the behaviour space of a previous study to include various sound durations, we uncover specialisation within temporal niches. Results indicate that CPPN and DSP graphs coupled with a Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) and a deep learning classifier can generate a substantial variety of synthetic sounds, diverse and innovative across temporal and contextual dimensions. We present the generated sound objects through an online explorer and as rendered sound files, and, in the context of music composition, an experimental application that showcases their creative potential across various durations and contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper describes a Quality-Diversity system for generative sound synthesis that couples Compositional Pattern Producing Networks (CPPNs) and DSP graphs with MAP-Elites and a supervised deep-learning classifier, inspired by Innovation Engines. It introduces a multi-CPPN architecture specialized by frequency range, analyzes evolutionary stepping-stone trajectories across musical/non-musical contexts, and demonstrates temporal niche specialization when the behaviour space is expanded to include variable sound durations. The central empirical claim is that the resulting archives produce a substantial variety of synthetic sounds that are diverse and innovative across temporal and contextual dimensions, with outputs released via an online explorer and rendered files for compositional use.

Significance. If the empirical results hold, the work supplies a practical exploration tool that automates serendipitous discovery in audio spaces while making the generated objects directly accessible. The multi-CPPN frequency specialization and the stepping-stone analysis constitute concrete methodological contributions that could be adopted in other QD audio applications. The public release of the explorer and sound files is a clear strength for reproducibility and creative uptake.

major comments (1)
  1. [Method (discriminative model) and Results] The quality signal supplied by the supervised discriminative model is load-bearing for the claim that the generated sounds are 'innovative' in a musically useful sense, yet the manuscript provides no human listening tests or correlation analysis between classifier scores and perceptual judgments of musical quality or novelty. This assumption is stated in the abstract and method description but is not empirically tested.
minor comments (2)
  1. [Abstract] The abstract refers to 'Innovation Engines' without a brief parenthetical definition or citation; this should be clarified for readers outside the QD community.
  2. [Figures and Tables] Figure captions and table headers should explicitly state the number of independent runs and any statistical tests used to support claims of 'comparable performance' between single- and multi-CPPN configurations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments. We address the single major comment below.

read point-by-point responses
  1. Referee: [Method (discriminative model) and Results] The quality signal supplied by the supervised discriminative model is load-bearing for the claim that the generated sounds are 'innovative' in a musically useful sense, yet the manuscript provides no human listening tests or correlation analysis between classifier scores and perceptual judgments of musical quality or novelty. This assumption is stated in the abstract and method description but is not empirically tested.

    Authors: We agree that the manuscript does not include human listening tests or a correlation analysis validating that classifier scores align with perceptual judgments of musical quality or novelty. The discriminative model serves as a proxy quality signal, trained on labeled musical versus non-musical audio, following the Innovation Engine approach. This constitutes an untested assumption in the current work. We will revise the manuscript to explicitly acknowledge this limitation in the method and discussion sections and identify perceptual validation as future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

This paper is an empirical demonstration study that applies established algorithms (MAP-Elites, CPPNs, DSP graphs, supervised deep learning classifier) to sound generation and reports observed outcomes such as archive coverage, stepping-stone lineages, and temporal specialisation. No derivation chain, equations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citation load-bearing premises within the paper. The central claims rest on experimental results and external benchmarks rather than internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities can be extracted. The approach implicitly assumes the classifier provides an independent quality signal and that the chosen behavior descriptors capture musically relevant variation.

pith-pipeline@v0.9.1-grok · 5832 in / 1001 out tokens · 15032 ms · 2026-06-27T14:58:52.983729+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 36 canonical work pages · 4 internal anchors

  1. [1]

    Princeton University Press, Princeton, New Jersey (2023)

    Noë, A.: The Entanglement : How Art and Philosophy Make Us What We Are. Princeton University Press, Princeton, New Jersey (2023). ISBN: 9780691188812 Place: Princeton, New Jersey

  2. [2]

    Organised Sound8(3), 237–247 (2003) https://doi.org/10.1017/S1355771803000219 20

    Wyse, L.: Free music and the discipline of sound. Organised Sound8(3), 237–247 (2003) https://doi.org/10.1017/S1355771803000219 20

  3. [3]

    Evolutionary Computation19(3), 373–403 (2011) https://doi.org/10.1162/EVCO_a_00030

    Secretan, J., Beato, N., D’Ambrosio, D.B., Rodriguez, A., Campbell, A., Folsom- Kovarik, J.T., Stanley, K.O.: Picbreeder: a case study in collaborative evolution- ary exploration of design space. Evolutionary Computation19(3), 373–403 (2011) https://doi.org/10.1162/EVCO_a_00030

  4. [4]

    Evolutionary Computation19(2), 189–223 (2011) https://doi

    Lehman,J.,Stanley,K.O.:AbandoningObjectives:EvolutionThroughtheSearch for Novelty Alone. Evolutionary Computation19(2), 189–223 (2011) https://doi. org/10.1162/EVCO_a_00025 . Conference Name: Evolutionary Computation

  5. [5]

    ISBN: 9781450305570

    Lehman, J., Stanley, K.O.: Evolving a diversity of creatures through novelty searchandlocalcompetition.GeneticandEvolutionaryComputationConference, GECCO’11 (Gecco), 211–218 (2011) https://doi.org/10.1145/2001576.2001606 . ISBN: 9781450305570

  6. [6]

    Mouret, J.-B., Clune, J.: Illuminating search spaces by mapping elites. arXiv. arXiv:1504.04909 [cs, q-bio] (2015). https://doi.org/10.48550/arXiv.1504.04909

  7. [7]

    Frontiers in Robotics and AI3(2016) https://doi

    Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI3(2016) https://doi. org/10.3389/frobt.2016.00040

  8. [8]

    IEEE Transactions on Evolutionary Computation22(2), 245–259 (2018) https://doi.org/10.1109/TEVC.2017.2704781

    Cully, A., Demiris, Y.: Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation22(2), 245–259 (2018) https://doi.org/10.1109/TEVC.2017.2704781

  9. [9]

    115–116 (2019)

    Gaier, A., Asteroth, A., Mouret, J.B.: Are quality diversity algorithms better at generating stepping stones than objective-based search? In: GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion, pp. 115–116 (2019). https://doi.org/10.1145/3319619. 3321897

  10. [10]

    Frontiers in Robotics and AI8, 56 (2021) https://doi.org/10.3389/frobt.2021.639173

    Nordmoen, J., Veenstra, F., Ellefsen, K.O., Glette, K.: MAP-Elites enables pow- erful stepping stones and diversity for modular robotics. Frontiers in Robotics and AI8, 56 (2021) https://doi.org/10.3389/frobt.2021.639173

  11. [11]

    In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation

    Nguyen, A.M., Yosinski, J., Clune, J.: Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO ’15, pp. 959–966. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2739480.2754703

  12. [12]

    Evolutionary Computation24(3), 545–572 (2016) https://doi.org/10.1162/EVCO_a_00189

    Nguyen,A.,Yosinski,J.,Clune,J.:Understandinginnovationengines:Automated creativity and improved stochastic optimization via deep learning. Evolutionary Computation24(3), 545–572 (2016) https://doi.org/10.1162/EVCO_a_00189

  13. [13]

    Genetic Programming and Evolvable Machines8(2), 131–162 21 (2007) https://doi.org/10.1007/s10710-007-9028-8

    Stanley, K.O.: Compositional pattern producing networks: A novel abstraction of development. Genetic Programming and Evolvable Machines8(2), 131–162 21 (2007) https://doi.org/10.1007/s10710-007-9028-8

  14. [14]

    Proceedings of the IEEE89(9), 1275–1296 (2001) https://doi.org/10.1109/5.949485

    Takagi, H.: Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proceedings of the IEEE89(9), 1275–1296 (2001) https://doi.org/10.1109/5.949485 . Conference Name: Proceedings of the IEEE

  15. [16]

    Journal of the Audio Engineering Society72(4) (2024) https://doi.org/10.17743/jaes.2022.0137

    Jónsson, B.T., Erdem, C., Glette, K.: A System for Sonic Explorations with Evolutionary Algorithms. Journal of the Audio Engineering Society72(4) (2024) https://doi.org/10.17743/jaes.2022.0137

  16. [17]

    Academic Press, ??? (1995)

    Moore, B.C.J.: Hearing. Academic Press, ??? (1995). ISBN: 0125056265 Place: San Diego, Calif Series: Handbook of perception and cognition (2nd ed.)

  17. [18]

    In: Ystad, S., Kronland- Martinet, R., Jensen, K

    Godøy, R.I.: Chunking Sound for Musical Analysis. In: Ystad, S., Kronland- Martinet, R., Jensen, K. (eds.) Computer Music Modeling and Retrieval. Genesis of Meaning in Sound and Music. Lecture Notes in Computer Science, pp. 67–80. Springer,Berlin,Heidelberg(2009).https://doi.org/10.1007/978-3-642-02518-1_ 4

  18. [19]

    Evolutionary Computation10(2), 99–127 (2002) https://doi.org/10

    Stanley, K.O., Miikkulainen, R.: Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation10(2), 99–127 (2002) https://doi.org/10. 1162/106365602320169811

  19. [20]

    In: Johnson, C., Rebelo, S.M., Santos, I

    Jónsson, B.T., Erdem, C., Fasciani, S., Glette, K.: Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs. In: Johnson, C., Rebelo, S.M., Santos, I. (eds.) Artificial Intelligence in Music, Sound, Art And Design vol. 14633, pp. 211–227. Springer, Cham (2024). https://doi.org/10. 1007/978-3-031-56992-0_14 . Series Title: Lectur...

  20. [21]

    DataverseNO (2024)

    Jónsson, B.T., Glette, K., Erdem, C., Fasciani, S.: Supporting Data for: Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs. DataverseNO (2024). https://doi.org/10.18710/BAX9N5

  21. [22]

    Cagri, Fasciani, S., Glette, K.: Extended Data for: Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration (2024)

    Jónsson, B.T., Erdem, E. Cagri, Fasciani, S., Glette, K.: Extended Data for: Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration (2024). https://doi.org/10.18710/4FBT38

  22. [23]

    Audio set: An ontology and human-labeled dataset for audio events,

    Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., Ritter, M.: Audio Set: An ontology and human-labeled dataset 22 for audio events. In: Proc. IEEE ICASSP 2017, New Orleans, LA (2017). https: //doi.org/10.1109/ICASSP.2017.7952261

  23. [24]

    Deep Residual Learning for Image Recognition

    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848 . ISSN: 1063-6919

  24. [25]

    In: In arXiv E-prints: 2304.12521 (2023)

    Choi, K., Im, J., Heller, L., McFee, B., Imoto, K., Okamoto, Y., Lagrange, M., Takamichi, S.: Foley Sound Synthesis at the DCASE 2023 Challenge. In: In arXiv E-prints: 2304.12521 (2023). https://doi.org/10.48550/arXiv.2304.12521

  25. [26]

    In: Proceedings of the Seventh International Conference on Computational Creativity : ICCC 2016

    Lehman, J., Risi, S., Clune, J.: Creative Generation of 3D Objects with Deep Learning and Innovation Engines. In: Proceedings of the Seventh International Conference on Computational Creativity : ICCC 2016. 7, pp. 180–187. Sony CSL Paris, Paris, France (2016)

  26. [27]

    Master’s thesis, The University of Oklahoma (May 2015)

    Rice, D.: GenSynth: Collaboratively Evolving Novel Synthetic Musical Instru- ments. Master’s thesis, The University of Oklahoma (May 2015). https://doi.org/ 10.13140/RG.2.1.4691.6001

  27. [28]

    XRDS: Crossroads, The ACM Magazine for Students26(4), 54–59 (2020) https://doi.org/10.1145/3398459

    Pathak, A.: Introduction to Git for beginners. XRDS: Crossroads, The ACM Magazine for Students26(4), 54–59 (2020) https://doi.org/10.1145/3398459 . Accessed 2024-08-16

  28. [29]

    Zenodo (2024)

    Jónsson, B.T.: synth-is/kromosynth. Zenodo (2024). https://doi.org/10.5281/ ZENODO.13342452 . https://zenodo.org/doi/10.5281/zenodo.13342452 Accessed 2024-08-19

  29. [30]

    Zenodo (2024)

    Jónsson, B.T.: synth-is/kromosynth-cli. Zenodo (2024). https://doi.org/10.5281/ ZENODO.13342465 . https://zenodo.org/doi/10.5281/zenodo.13342465 Accessed 2024-08-19

  30. [31]

    Zenodo (2024)

    Jónsson, B.T.: synth-is/kromosynth-evaluate. Zenodo (2024). https://doi.org/ 10.5281/ZENODO.13342462 . https://zenodo.org/doi/10.5281/zenodo.13342462 Accessed 2024-08-19

  31. [32]

    Zenodo (2024)

    Jónsson, B.T.: synth-is/kromosynth-render. Zenodo (2024). https://doi.org/ 10.5281/ZENODO.13342466 . https://zenodo.org/doi/10.5281/zenodo.13342466 Accessed 2024-08-19

  32. [33]

    In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation

    Pugh, J.K., Soros, L.B., Szerlip, P.A., Stanley, K.O.: Confronting the Challenge of Quality Diversity. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO ’15, pp. 967–974. Association for Com- puting Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2739480. 2754664 23

  33. [34]

    Springer, Cham (2015)

    Stanley, K.O., Lehman, J.: Why Greatness Cannot Be Planned. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15524-1 . http://link.springer.com/10.1007/978-3-319-15524-1Accessed 2022-08-30

  34. [35]

    In: Proceedings of the International Computer Music Conference, pp

    Garber, L., Ciccola, T., Amusategui, J.: AudioStellar, an open source corpus- based musical instrument for latent sound structure discovery and sonic experi- mentation. In: Proceedings of the International Computer Music Conference, pp. 62–67 (2021)

  35. [36]

    Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

    Nguyen, A., Yosinski, J., Clune, J.: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. arXiv, ??? (2015). https: //doi.org/10.48550/arXiv.1412.1897 . arXiv:1412.1897 [cs]

  36. [37]

    Gong, Y., Lai, C.-I.J., Chung, Y.-A., Glass, J.: SSAST: Self-Supervised Audio Spectrogram Transformer. arXiv. arXiv:2110.09784 [cs, eess] (2022). https://doi. org/10.48550/arXiv.2110.09784

  37. [38]

    In: NeurIPS (2022)

    Huang, P.-Y., Xu, H., Li, J., Baevski, A., Auli, M., Galuba, W., Metze, F., Feichtenhofer, C.: Masked Autoencoders that Listen. In: NeurIPS (2022). https: //doi.org/10.48550/arXiv.2207.06405

  38. [39]

    In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M

    McCormack, J., Cruz Gambardella, C.: Quality-Diversity for Aesthetic Evolution. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds.) Artificial Intelli- gence in Music, Sound, Art And Design. Lecture Notes in Computer Science, pp. 369–384.Springer,Cham(2022).https://doi.org/10.1007/978-3-031-03789-4_24

  39. [40]

    Auto-Encoding Variational Bayes

    Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014). https://doi.org/10.48550/arXiv.1312.6114

  40. [41]

    McCormack, J., Gambardella, C.C., Krol, S.J.: Creative Discovery using QD Search. arXiv. arXiv:2305.04462 [cs] (2023). https://doi.org/10.48550/arXiv. 2305.04462

  41. [42]

    In: Proceedings of the Genetic and Evolutionary Computation Con- ference, pp

    Cully, A.: Autonomous skill discovery with quality-diversity and unsupervised descriptors. In: Proceedings of the Genetic and Evolutionary Computation Con- ference, pp. 81–89. ACM, Prague Czech Republic (2019). https://doi.org/10. 1145/3321707.3321804

  42. [43]

    IEEE Transactions on Evolutionary Computation26(6), 1539– 1552 (2022) https://doi.org/10.1109/TEVC.2022.3159855

    Grillotti, L., Cully, A.: Unsupervised Behavior Discovery With Quality-Diversity Optimization. IEEE Transactions on Evolutionary Computation26(6), 1539– 1552 (2022) https://doi.org/10.1109/TEVC.2022.3159855

  43. [44]

    Ding, L., Zhang, J., Clune, J., Spector, L., Lehman, J.: Quality Diversity through Human Feedback. arXiv. arXiv:2310.12103 [cs] (2023). https://doi.org/10.48550/ arXiv.2310.12103 24

  44. [45]

    Bloomsbury Academic, New York, NY (2019)

    Magnusson, T.: Sonic Writing: Technologies of Material, Symbolic and Signal Inscriptions. Bloomsbury Academic, New York, NY (2019)

  45. [46]

    Pennsylvania State University Press, ??? (1996)

    Davis, W.: Replications : Archaeology, Art History, Psychoanalysis. Pennsylvania State University Press, ??? (1996). ISBN: 0271015233 Place: University Park, Penn 25