A study is presented in which a contrastive learning approach is used to extract low-dimensional representations of the acoustic environment from single-channel, reverberant speech signals. Convolution of room impulse responses (RIRs) with anechoic source signals is leveraged as a data augmentation technique that offers considerable flexibility in the design of the upstream task. We evaluate the embeddings across three different downstream tasks, which include the regression of acoustic parameters reverberation time RT60 and clarity index C50, and the classification into small and large rooms. We demonstrate that the learned representations generalize well to unseen data and achieve similar performance compared to a fully supervised baseline.
翻译:一项研究采用对比式学习方法,从单声道、回旋式语音信号中提取声学环境的低维表示法; 将带有厌碎源信号的室冲反应(RIRs)作为数据增强技术加以利用,在设计上游任务时具有相当大的灵活性; 我们评估了三种不同下游任务中的嵌入情况,其中包括声学参数回溯时间RT60和清晰度指数C50的回归,以及分为大小房间的分类; 我们证明,所学的表达法与未见数据相提并论,其性能与完全监督的基线相类似。