This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes. For all tests we recorded audio and video with our innovative self-developed remote collection methods. The video recordings serve to mitigate the challenge of low intelligibility in L2 narratives produced by young children during the transcription process. This corpus offers valuable resources for second language teaching and has the potential to enhance the overall performance of automatic speech recognition (ASR).
翻译:本文介绍了一个非母语语料库,包括来自50名5-6岁的中英双语儿童的讲述。文章呈现了英语(L2)的儿童叙述理解测试的总计6.5小时的记录,以及评分和语法和发音错误的注释。儿童还完成了参考的汉语(L1)平行主要测试。对于所有测试,我们都使用创新的自主开发的远程收集方法进行音频和视频记录。视频记录用于缓解年幼儿童在转录过程中英语(L2)叙述的低可懂度的挑战。这个语料库为第二语言教学提供了有价值的资源,并有潜力提高自动语音识别(ASR)的整体性能。