BioADAPT-MRC:基于反向学习的适应领域改进生物医学机器阅读理解任务 (BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task)

Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled general-purpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets -- BioASQ-7b, BioASQ-8b, and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets. Availability: BioADAPT-MRC is freely available as an open-source project at \url{https://github.com/mmahbub/BioADAPT-MRC}.

翻译：生物医学机读理解(生物医学-MRC)旨在理解复杂的生物医学叙事,并协助保健专业人员从他们那里检索信息。现代神经网络的神经网络MRC系统的高性能取决于高质量、大规模、有人类附加说明的培训数据集。在生物医学领域,创建这种数据集的关键挑战在于对域知识的要求,导致标签数据稀缺,以及需要从标签通用(源)域向生物医学(目标)域转移学习。然而,由于话题上的差异,一般用途和生物医学领域之间的边际分布存在差异。因此,将经过培训的通用域模型的学术陈述直接转移到生物医学领域,可能会损害模型的绩效。我们为生物医学机阅读理解任务(BioADAPT-MRC)提出了一个基于对抗性学习域适应框架,这是一种基于神经网络的方法,用以解决普通和生物医学(源)域(目标)域内任何边际分布的差异。生物医学-PTQQQ(由于主题上的差异,生物医学-QQQQ) 将模拟标签用于不使用现有生物医学-生物医学-生物医学-生物医学数据库数据库数据库数据库数据库数据库的目前使用的最佳数据。