新的阿姆哈拉语情感言论数据集和分类基准 (A New Amharic Speech Emotion Dataset and Classification Benchmark)

In this paper we present the Amharic Speech Emotion Dataset (ASED), which covers four dialects (Gojjam, Wollo, Shewa and Gonder) and five different emotions (neutral, fearful, happy, sad and angry). We believe it is the first Speech Emotion Recognition (SER) dataset for the Amharic language. 65 volunteer participants, all native speakers, recorded 2,474 sound samples, two to four seconds in length. Eight judges assigned emotions to the samples with high agreement level (Fleiss kappa = 0.8). The resulting dataset is freely available for download. Next, we developed a four-layer variant of the well-known VGG model which we call VGGb. Three experiments were then carried out using VGGb for SER, using ASED. First, we investigated whether Mel-spectrogram features or Mel-frequency Cepstral coefficient (MFCC) features work best for Amharic. This was done by training two VGGb SER models on ASED, one using Mel-spectrograms and the other using MFCC. Four forms of training were tried, standard cross-validation, and three variants based on sentences, dialects and speaker groups. Thus, a sentence used for training would not be used for testing, and the same for a dialect and speaker group. The conclusion was that MFCC features are superior under all four training schemes. MFCC was therefore adopted for Experiment 2, where VGGb and three other existing models were compared on ASED: RESNet50, Alex-Net and LSTM. VGGb was found to have very good accuracy (90.73%) as well as the fastest training time. In Experiment 3, the performance of VGGb was compared when trained on two existing SER datasets, RAVDESS (English) and EMO-DB (German) as well as on ASED (Amharic). Results are comparable across these languages, with ASED being the highest. This suggests that VGGb can be successfully applied to other languages. We hope that ASED will encourage researchers to experiment with other models for Amharic SER.

翻译：在本文中,我们展示了阿姆哈拉语言情感数据集(ASED),该数据集涵盖四种方言(Gojjam、Wollo、Shewa和Gonder)和五种不同的情感(中立、恐惧、快乐、悲哀和愤怒)。我们认为这是阿姆哈拉语首个语音情感识别数据集。65名志愿者参与者(所有本地演讲者,记录了2,474个声音样本,两至四秒钟长度。8名法官将情感分配给样本,且协议级别很高(Fleis kappa=0.8)。由此产生的数据集可以免费下载。接下来,我们开发了众所周知的VGGM模型的四层变式。我们称之为VGB。然后,我们用VGBS(S)做了三次实验。首先,我们调查了Mel-spectrogrogram特征或Mel-频 Cepstral 系数(MFCC) 是否为Amaricrial 工作最优。这是通过在ASED上培训两个VGB模型, 使用M-SB模型, 和MFC(我们使用MED) 三个变数模型进行这种模拟测试, 和变数模型。一个测试了。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日