Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions -- commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent. Combined with the downstream performance, it helps us make a more informed prediction of how robust the embeddings are to the channel effects. We evaluate two embeddings -- YAMNet, and OpenL3 on monophonic (UrbanSound8K) and polyphonic (SONYC-UST) urban datasets. We show that one distance measure does not suffice in such task-independent evaluation. Although Fr\'echet Audio Distance (FAD) correlates with the trend of the performance drop in the downstream task most accurately, we show that we need to study FAD in conjunction with the other distances to get a clear understanding of the overall effect of the perturbation. In terms of the embedding performance, we find OpenL3 to be more robust than YAMNet, which aligns with the HEAR evaluation.
翻译:包含环境声音分析的音频应用程序越来越多地使用通用音频显示器(也称为嵌入器)来进行传输学习。 最近,全声显示器(HEAR)对十九项不同任务中的29种嵌入模型进行了评估。 但是,评价的有效性取决于某个数据集中已经捕捉到的变异。 因此,对于一个特定的数据域,尚不清楚该表示器将如何受到由无数麦克风射程和声学条件(通常称为频道效应)造成的变异的影响。 我们的目标是扩大听觉以评价在这项工作中频道效果的变异性。 为了实现这一目标,我们通过向音频信号注入扰动来模仿频道效应,并测量新(闭音)嵌模式中以三种距离计量的嵌入模式的变化变化,使得评价域域独立,但并不取决于任务。 结合下游的性能,它有助于我们更知情地预测嵌入器与频道效应之间的变异性。 我们评估了两个嵌入器 -- YAMNet, OpenL3 在单调(Urbound8K) 和多功能(SON-Orphoni) 中,我们更清晰地测量了直径对城市数据的性评估。 我们在直径分析中显示一个直径分析中, 的运行的性任务需要一个直径显示, 直径显示, 直径, 直线显示的性任务的性任务的性能的性能的性能的性能的性能,我们显示的性能,我们显示, 。