Code-switching is a speech phenomenon when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data through read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong. We report ASCEND's design and procedure of collecting the speech data, including the annotations in this work. ASCEND includes 23 bilinguals that are fluent in both Chinese and English and consists of 10.62 hours clean speech corpus.
翻译:校对:Portnoy-English Dataset(ASCEND)(自发的中文-英文数据集)引入了在香港收集的自发多方向对话的高质量资源。我们报告ASCEND(ASCEND)设计及程序收集言语数据,包括本文的说明。ASCEND(ASCEND)包括23种英语和中文流利的双语,包括10.62小时的清洁言语材料。