The increasing adoption of text-to-speech technologies has led to a growing demand for natural and emotive voices that adapt to a conversation's context and emotional tone. This need is particularly relevant for interactive narrative-driven systems such as video games, TV shows, and graphic novels. To address this need, we present the Emotive Narrative Storytelling (EMNS) corpus, a dataset of high-quality British English speech with labelled utterances designed to enhance interactive experiences with dynamic and expressive language. We provide high-quality clean audio recordings and natural language description pairs with transcripts and user-reviewed and self-reported labels for features such as word emphasis, expressiveness, and emotion labels. EMNS improves on existing datasets by providing higher quality and clean recording to aid more natural and expressive speech synthesis techniques for interactive narrative-driven experiences. Additionally, we release our remote and scalable data collection system to the research community.
翻译:暂无翻译