We present Burst2Vec, our multi-task learning approach to predict emotion, age, and origin (i.e., native country/language) from vocal bursts. Burst2Vec utilises pre-trained speech representations to capture acoustic information from raw waveforms and incorporates the concept of model debiasing via adversarial training. Our models achieve a relative 30 % performance gain over baselines using pre-extracted features and score the highest amongst all participants in the ICML ExVo 2022 Multi-Task Challenge.
翻译:我们提出伯斯特2Vec(Burst2Vec),这是我们从声波爆发中预测情绪、年龄和出身(即本地国家/语言)的多任务学习方法。 伯斯特2Vec使用预先培训的演讲演示,从原始波形中获取声学信息,并纳入通过对抗性训练进行模式贬低的概念。 我们的模型在基线上取得了相对30%的绩效收益,使用了预先提取的特征,并在ICML ExVo 2022多任务挑战的所有参与者中得分最高。