In this paper, we propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information. Unlike conventional audio classification methods that treat class labels as discrete vectors, we employ a language model to extract abundant semantics from labels and optimize the semantic consistency between audio signals and their labels. We verify that simple textual information from labels and advanced pretraining models enable more abundant semantic supervision for better performance. Specifically, we design a text encoder to capture the semantic information from the text extension of labels. Then we map the audio signals to align with the semantics of corresponding class labels via an audio encoder and a similarity calculation module so as to enforce the semantic consistency. Extensive experiments on two audio datasets, ESC-50 and US8K demonstrate that our proposed method consistently outperforms the compared audio classification methods.
翻译:在本文中,我们建议使用音频分类的语义学辅助框架SemisticAC, 以更好地利用语义信息。与将类标签作为离散矢量对待的传统音义分类方法不同,我们使用一种语言模型从标签中提取大量的语义学,优化音频信号及其标签之间的语义一致性。我们核实来自标签和高级预培训模型的简单文字信息能够使更多的语义监督更好发挥作用。具体地说,我们设计了一个文字编码器,从标签的文本扩展中获取语义学信息。然后我们绘制音频信号,通过音频编码器和类似计算模块与相应类标签的语义相一致,以强化语义一致性。关于两个音义数据集(ESC-50和US8K)的广泛实验表明,我们拟议的方法始终超越了比较音义分类方法。