Creating multiple-choice questions to assess reading comprehension of a given article involves generating question-answer pairs (QAPs) and adequate distractors. We present two methods to tackle the challenge of QAP generations: (1) A deep-learning-based end-to-end question generation system based on T5 Transformer with Preprocessing and Postprocessing Pipelines (TP3). We use the finetuned T5 model for our downstream task of question generation and improve accuracy using a combination of various NLP tools and algorithms in preprocessing and postprocessing to select appropriate answers and filter undesirable questions. (2) A sequence-learning-based scheme to generate adequate QAPs via meta-sequence representations of sentences. A meta-sequence is a sequence of vectors comprising semantic and syntactic tags. we devise a scheme called MetaQA to learn meta sequences from training data to form pairs of a meta sequence for a declarative sentence and a corresponding interrogative sentence. The TP3 works well on unseen data, which is complemented by MetaQA. Both methods can generate well-formed and grammatically correct questions. Moreover, we present a novel approach to automatically generate adequate distractors for a given QAP. The method is a combination of part-of-speech tagging, named-entity tagging, semantic-role labeling, regular expressions, domain knowledge bases, word embeddings, word edit distance, WordNet, and other algorithms.
翻译:该论文介绍了两种生成问题答案对(QAP)和合适干扰项的方法,用于评估给定文章的阅读理解能力。第一个方法是基于T5变压器的深度学习端到端自动问题生成系统(TP3)。我们使用fine-tuned T5模型进行下游任务的问题生成,并使用各种自然语言处理工具和算法的组合,通过预处理和后处理选择适当的答案和过滤不良问题,以提高准确性。第二个方法是一种基于序列学习的方案,通过元序列句子的元表示形式生成充分的QAP。元序列是一个包含语义和句法标签的向量序列。我们设计了一种叫做MetaQA的方案,从训练数据中学习元序列,以形成陈述句的元序列与相应的疑问句的配对。TP3在未见过的数据上表现良好,并由MetaQA进行补充。两种方法均能生成格式良好且语法正确的问题。此外,我们提供了一种自动生成适当干扰项的新方法。该方法是对词性标注、命名实体识别、语义角色标注、正则表达式、领域知识库、词嵌入、单词编辑距离、WordNet和其他算法的组合。