Sentence completion (SC) questions present a sentence with one or more blanks that need to be filled in, three to five possible words or phrases as options. SC questions are widely used for students learning English as a Second Language (ESL). In this paper, we present a large-scale SC dataset, \textsc{SC-Ques}, which is made up of 292,517 ESL SC questions from real-world standardized English examinations. Furthermore, we build a comprehensive benchmark of automatically solving the SC questions by training the large-scale pre-trained language models on the proposed \textsc{SC-Ques} dataset. We conduct detailed analysis of the baseline models performance, limitations and trade-offs. The data and our code are available for research purposes from: \url{https://github.com/ai4ed/SC-Ques}.
翻译:句尾问题(SC) 包含一个需要填写的一个或多个空白的句子, 3至5个可能的单词或短语作为选项。 SC问题被广泛用于学生学习英语作为第二语言(ESL) 。 在本文中,我们提出了一个大规模SC数据集,\ textsc{SC- Ques}, 由来自现实世界标准化英语考试的 292,517 ESL SC问题组成。 此外,我们为自动解决SC问题建立了一个全面基准,通过培训关于拟议的\ textsc{SC- Ques} 数据集的大规模预先培训语言模型。 我们对基线模型的性能、局限性和取舍进行了详细分析。 我们的数据和代码可用于研究目的, 来源为:\url{https://github.com/ai4ed/SC- Ques}。