The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally. This year's competition comprises four tracks using a large-scale and in-the-wild dataset of 59,299 vocalizations from 1,702 speakers. The first, the A-VB-High task, requires competition participants to perform a multi-label regression on a novel model for emotion, utilizing ten classes of richly annotated emotional expression intensities, including; Awe, Fear, and Surprise. The second, the A-VB-Two task, utilizes the more conventional 2-dimensional model for emotion, arousal, and valence. The third, the A-VB-Culture task, requires participants to explore the cultural aspects of the dataset, training native-country dependent models. Finally, for the fourth task, A-VB-Type, participants should recognize the type of vocal burst (e.g., laughter, cry, grunt) as an 8-class classification. This paper describes the four tracks and baseline systems, which use state-of-the-art machine learning methods. The baseline performance for each track is obtained by utilizing an end-to-end deep learning model and is as follows: for A-VB-High, a mean (over the 10-dimensions) Concordance Correlation Coefficient (CCC) of 0.5687 CCC is obtained; for A-VB-Two, a mean (over the 2-dimensions) CCC of 0.5084 is obtained; for A-VB-Culture, a mean CCC from the four cultures of 0.4401 is obtained; and for A-VB-Type, the baseline Unweighted Average Recall (UAR) from the 8-classes is 0.4172 UAR.
翻译:ACII Afficive Vocal Bursts U-VB-High 研讨会和竞争的焦点是了解声波波的多重影响层面:笑声、喘息、哭喊、尖叫和许多其他非语言性声音,这些声音对于情感的表达和整个人类的交流至关重要。今年的竞赛由四条轨道组成,使用来自1 702个发言者的大规模和在网上的59 299个声学数据集。第一,A-VB-High任务,要求竞争参与者在情感的新模型上进行多标签回归,使用10类有注释的情感表达,包括:Awe、Afear和Surpris。第二,A-VB-2任务,使用更传统的二维模型来表达情感、振动和价值。第三,A-V-B-C-C-Culture 任务,参与者们要探索数据元模型的文化方面,培训本地的依附模型。最后,A-VB-Type,参与者们应该认识到声波的情感表达方式:A-B