A text-to-audio generative model is adapted for room impulse response generation using vision-language model labeling of image-RIR datasets and in-context learning for free-form prompts.
We demon- strate for the first time that large-scale generative audio priors can be effectively leveraged for RIR generation task
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Adapting a Text-to-Audio Model for Room Impulse Response Generation
A text-to-audio generative model is adapted for room impulse response generation using vision-language model labeling of image-RIR datasets and in-context learning for free-form prompts.