通过域目标合成问题生成的零射神经通过通道检索 (Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation)

A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, yet noisy, question-passage relevance pairs that are domain specific. Furthermore, when this is coupled with a simple hybrid term-neural model, first-stage retrieval performance can be improved further. Empirically, we show that this is an effective strategy for building neural passage retrieval models in the absence of large training corpora. Depending on the domain, this technique can even approach the accuracy of supervised models.

翻译：广泛采用神经检索模型的一个主要障碍是,这些模型需要大型的监管培训,以超越传统术语技术,这些技术是用原始公司建造的。在本文中,我们建议采用零光的通道检索学习方法,使用合成问题生成来缩小这一差距。问题生成系统经过一般域数据培训,但适用于目标领域的文件。这使我们能够制造出专有的、但又吵闹的、有问题的对子,这是特定领域的。此外,如果结合一个简单的混合术语-神经模型,第一阶段的检索性能可以进一步改进。我们经常地表明,这是在没有大型培训公司的情况下建立神经通道检索模型的有效战略。根据领域的情况,这种技术甚至可以接近受监督模型的准确性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

近期必读的五篇顶会 ACL 2020【图神经网络 (GNN) 】相关论文

专知会员服务

105+阅读 · 2020年6月9日