We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single instrument notes. We evaluate the models with a cross-modal retrieval task on synth patches. Finally, we demonstrate the application of TimbreCLIP on two tasks: text-driven audio equalization and timbre to image generation.
翻译:我们介绍关于Timmbre CLIP的工作进展,Timbre CLIP是经过单一仪器注释培训的音频-文字交叉嵌入模式,我们用合成补丁的跨模式检索任务对模型进行评估,最后,我们展示了TimbreCLIP在两个任务上的应用:文本驱动音频平和和图像生成微调。