We introduce a community-sourced dataset for English Language Question Answering (ELQA), which consists of more than 180k questions and answers on numerous topics about English language such as grammar, meaning, fluency, and etymology. The ELQA corpus will enable new NLP applications for language learners. We introduce three tasks based on the ELQA corpus: 1) answer quality classification, 2) semantic search for finding similar questions, and 3) answer generation. We present baselines for each task along with analysis, showing the strengths and weaknesses of current transformer-based models. The ELQA corpus and scripts are publicly available for future studies.
翻译:我们为英语问题解答(ELQA)引入了一个社区数据集,该数据集由关于英语的众多专题,如语法、含义、流利度和文体学等180公里的问答组成,将使语言学习者能够申请新的《ELQA》,我们根据《ELQA》提出三项任务:1) 回答质量分类,2) 语义搜索以寻找类似问题,3) 解答生成。我们提出每项任务的基线,同时进行分析,显示以变压器为基础的现有模型的长处和短处。《ELQA》的文体和脚本可以公开供今后研究使用。