2022年ICML 表达式演化讲习班和竞争:承认、产生和个性化 (The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts)

Alice Baird,Panagiotis Tzirakis,Gauthier Gidel,Marco Jiralerspong,Eilif B. Muller,Kory Mathewson,Björn Schuller,Erik Cambria,Dacher Keltner,Alan Cowen

The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts. This paper describes the three tracks and provides performance measures for baseline models using state-of-the-art machine learning strategies. The baseline for each track is as follows, for ExVo-MultiTask, a combined score, computing the harmonic mean of Concordance Correlation Coefficient (CCC), Unweighted Average Recall (UAR), and inverted Mean Absolute Error (MAE) ($S_{MTL}$) is at best, 0.335 $S_{MTL}$; for ExVo-Generate, we report Fr\'echet inception distance (FID) scores ranging from 4.81 to 8.27 (depending on the emotion) between the training set and generated samples. We then combine the inverted FID with perceptual ratings of the generated samples ($S_{Gen}$) and obtain 0.174 $S_{Gen}$; and for ExVo-FewShot, a mean CCC of 0.444 is obtained.

翻译：EXVO 2022, 包括三个竞争轨道, 使用大型数据集59 201个音响来自1 702个发言者。第一个 EXVO-MultiTask, 要求参与者训练一个多任务模型, 以识别来自声波的表达情绪和人口特征。第二个, ExVO-General, 要求参与者训练一个基因模型, 以产生十种不同的情绪。第三个, ExVo- FewShot, 要求参与者利用几发式学习纳入演讲者身份, 以训练一个模型, 以识别来自1 702个发言者的10种情绪。 ExVO- MultiTask, 要求参与者训练一个多任务模型, 以识别来自声音的表达情绪和人口特征。第二, ExVO-MATTERT, 要求参与者训练一个组合, 计算调音的调值, 10种调值为 ALS ExGS ExBALS (CG) Excoalalate, exalatelate, (CL) Exmarlationlate, (C) Ex-L.

相关内容

ICML 2022

关注 16

国际机器学习大会(International Conference on Machine Learning，简称ICML ) 是由国际机器学习学会（IMLS）主办的机器学习国际顶级会议，也是CCF-A类学术会议。ICML 2022 共收到5630 投稿，接收1117 篇 short oral，118篇 long oral，录用率为21.94%。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日