This report provides a brief description of our proposed solution for the Vocal Burst Type classification task of the ACII 2022 Affective Vocal Bursts (A-VB) Competition. We experimented with two approaches as part of our solution for the task at hand. The first of which is based on convolutional neural networks trained on Mel Spectrograms, and the second is based on average pooling of deep acoustic embeddings from a pretrained wav2vec2 model. Our best performing model achieves an unweighted average recall (UAR) of 0.5190 for the test partition, compared to the chance-level UAR of 0.1250 and a baseline of 0.4172. Thus, an improvement of around 20% over the challenge baseline. The results reported in this document demonstrate the efficacy of our proposed approaches to solve the AV-B Type Classification task.
翻译:本报告简要说明了我们为ACII 2022 Affective Vocal Burst(A-VB)竞争的Vocal Burst类型分类任务提议的解决方案。我们实验了两种办法,作为我们当前任务解决办法的一部分,其中第一种办法基于经过Mel Spectrotrograms培训的进化神经网络,第二种办法基于从预先培训的 wav2vec2模型中平均集中深声嵌入。我们最优秀的模型在试验分区中实现了未加权的平均回调0.5190,而机会值UAR为0.1250,基线为0.4172。因此,比挑战基线提高了约20%。本文件报告的结果显示了我们为解决AV-B型分类任务而提出的办法的有效性。