The human voice effectively communicates a range of emotions with nuanced variations in acoustics. Existing emotional speech corpora are limited in that they are either (a) highly curated to induce specific emotions with predefined categories that may not capture the full extent of emotional experiences, or (b) entangled in their semantic and prosodic cues, limiting the ability to study these cues separately. To overcome this challenge, we propose a new approach called 'Genetic Algorithm with People' (GAP), which integrates human decision and production into a genetic algorithm. In our design, we allow creators and raters to jointly optimize the emotional prosody over generations. We demonstrate that GAP can efficiently sample from the emotional speech space and capture a broad range of emotions, and show comparable results to state-of-the-art emotional speech corpora. GAP is language-independent and supports large crowd-sourcing, thus can support future large-scale cross-cultural research.
翻译:人类的声音有效地传达了各种情感,其声波变化细微。现有的情感语言组合有限,因为它们要么(a) 高度调整,以诱发特定情感,其预设类别可能无法反映情感经历的全部程度,要么(b) 纠缠在语义和预言的提示中,限制分别研究这些感官的能力。为了克服这一挑战,我们提议了一种名为“与人一起的遗传算法(GAP) ” (GAP) 的新方法,它将人类的决定和生产纳入基因算法。在我们的设计中,我们允许创造者和定级者共同优化代代际的情感运动。我们证明GAP能够有效地从情感语言空间取样并捕捉广泛的情感,并显示与最先进的情感语言表达体的相似的结果。GAP是语言独立型的,支持大规模人群采购,从而支持未来的大规模跨文化研究。