Current studies in extractive question answering (EQA) have modeled the single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the general domain can be answered with a single span. Following general domain EQA models, current biomedical EQA (BioEQA) models utilize the single-span extraction setting with post-processing steps. In this article, we investigate the question distribution across the general and biomedical domains and discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer). This necessitates the models capable of producing multiple answers for a question. Based on this preliminary study, we propose a sequence tagging approach for BioEQA, which is a multi-span extraction setting. Our approach directly tackles questions with a variable number of phrases as their answer and can learn to decide the number of answers for a question from training data. Our experimental results on the BioASQ 7b and 8b list-type questions outperformed the best-performing existing models without requiring post-processing steps. Source codes and resources are freely available for download at https://github.com/dmis-lab/SeqTagQA
翻译:采掘问题解答( EQA) 中的现有研究模拟了单层抽取设置, 单个答案是用来预测特定问答对象的标签。 这种设置对于一般域 EQA 来说是自然的, 因为一般域的绝大多数问题可以单一地回答。 按照一般域 EQA 模型, 目前的生物医学 EQA (BioEQA) 模型使用单层抽取设置, 并采用后处理步骤。 在文章中, 我们调查在一般域和生物医学域之间的问题分布, 发现生物医学问题比事实类解答( 单一答案) 更有可能需要列表类解答( 多重解答) 。 这需要能够为一个问题提供多种答案的模式。 根据这项初步研究, 我们为BioEQA 模型( 是一个多层抽取环境的设置) 提出一个顺序标记方法。 我们的方法直接用一个变量解答问题, 并且可以学会从培训数据中决定答案的数量。 我们在BioASQ Q 7b 和 8b 列表- list- tyb typequest 这样的问题在不要求自由的 IMUA / practal- practal practal practal drodu dlasm as