以模型为基础的索引器对Corpus进行终极检索 (Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer) - 专知论文

会员服务 ·

0

知识 (knowledge) · Extensibility · 优化器 · MoDELS · 端到端 ·

2022 年 8 月 19 日

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

翻译：以模型为基础的索引器对Corpus进行终极检索

Yujia Zhou,Jing Yao,Zhicheng Dou,Ledell Wu,Peitian Zhang,Ji-Rong Wen

Document retrieval has been extensively studied within the index-retrieve framework for decades, which has withstood the test of time. Unfortunately, such a pipelined framework limits the optimization of the final retrieval quality, because indexing and retrieving are separated stages that can not be jointly optimized in an end-to-end manner. In order to unify these two stages, we explore a model-based indexer for document retrieval. Concretely, we propose Ultron, which encodes the knowledge of all documents into the model and aims to directly retrieve relevant documents end-to-end. For the model-based indexer, how to represent docids and how to train the model are two main issues to be explored. Existing solutions suffer from semantically deficient docids and limited supervised data. To tackle these two problems, first, we devise two types of docids that are richer in semantics and easier for model inference. In addition, we propose a three-stage training workflow to capture more knowledge contained in the corpus and associations between queries and docids. Experiments on two public datasets demonstrate the superiority of Ultron over advanced baselines for document retrieval.

翻译：几十年来,在索引检索框架内对文件检索进行了广泛的研究,这已经经受了时间的考验。不幸的是,这样一个编审框架限制了最后检索质量的优化,因为索引和检索是分化的阶段,不能以端到端的方式共同优化。为了统一这两个阶段,我们探索一个基于模型的文档检索索引器。具体地说,我们提议一个Ultron,它将所有文件的知识都编码到模型中,目的是直接检索相关文件的端到端。对于基于模型的索引器,如何代表 docid 和如何培训模型是有待探讨的两个主要问题。现有的解决方案存在语义缺陷的 docid 和受监督的数据有限的问题。为了解决这两个问题,我们首先设计了两种在语义学上更丰富、更便于模型推断的Docid 。此外,我们提议了一个三阶段的培训工作流程,以获取在查询和 docid之间数据库中包含的更多知识。关于两个公共数据库的实验显示Ultron 高级基线的检索优势。

0

相关内容

知识 (knowledge)

知识 (knowledge)

通过学习、实践或探索所获得的认识、判断或技能。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

模式参数误差对黑潮路径变异预报的影响

国家自然科学基金

0+阅读 · 2013年12月31日

Alzheimer病脑内BDNF/TrkB信号时空特异性调控LIMK1蛋白并保护突触功能的机制及应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于光纤微泡结构的LSPR线宽压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

间歇式公交专用道的通行能力形成机理及其影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于P2Y1受体介导的海马区胶质细胞神经炎性反应调控探讨电针对MCAO大鼠认知功能改善的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

飞机GLARE层板结构空气耦合超声兰姆波成像检测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

靶向GRPR的铁蛋白笼状嵌合体多功能分子探针的构建及其PET/MRI/NIRF三模式显像的探索研究

国家自然科学基金

0+阅读 · 2011年12月31日

RGM与neogenin信号调控应激性精神障碍-PTSD杏仁核、海马神经细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

Arxiv

0+阅读 · 2022年10月6日

Human-AI Shared Control via Policy Dissection

Arxiv

0+阅读 · 2022年10月5日

Exploiting Sentiment and Common Sense for Zero-shot Stance Detection

Arxiv

0+阅读 · 2022年10月5日

Data-Efficient Structured Pruning via Submodular Optimization

Arxiv

0+阅读 · 2022年10月4日

PlaneDepth: Plane-Based Self-Supervised Monocular Depth Estimation

Arxiv

0+阅读 · 2022年10月4日

An Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic

Arxiv

0+阅读 · 2022年10月2日

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Arxiv

0+阅读 · 2022年9月30日

Generative Model Watermarking Based on Human Visual System

Arxiv

0+阅读 · 2022年9月30日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Arxiv

10+阅读 · 2020年12月31日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维和高维空间中分析、建模和转换潜在表征

从无人机到数据：揭示边缘计算作为新作战域

可解释人工智能的基础

大规模视觉模型中的基于提示的适应：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples

Arxiv

0+阅读 · 2022年10月6日

Human-AI Shared Control via Policy Dissection

Arxiv

0+阅读 · 2022年10月5日

Exploiting Sentiment and Common Sense for Zero-shot Stance Detection

Arxiv

0+阅读 · 2022年10月5日

Data-Efficient Structured Pruning via Submodular Optimization

Arxiv

0+阅读 · 2022年10月4日

PlaneDepth: Plane-Based Self-Supervised Monocular Depth Estimation

Arxiv

0+阅读 · 2022年10月4日

An Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic

Arxiv

0+阅读 · 2022年10月2日

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Arxiv

0+阅读 · 2022年9月30日

Generative Model Watermarking Based on Human Visual System

Arxiv

0+阅读 · 2022年9月30日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Arxiv

10+阅读 · 2020年12月31日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

模式参数误差对黑潮路径变异预报的影响

国家自然科学基金

0+阅读 · 2013年12月31日

Alzheimer病脑内BDNF/TrkB信号时空特异性调控LIMK1蛋白并保护突触功能的机制及应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于光纤微泡结构的LSPR线宽压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

间歇式公交专用道的通行能力形成机理及其影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于P2Y1受体介导的海马区胶质细胞神经炎性反应调控探讨电针对MCAO大鼠认知功能改善的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

飞机GLARE层板结构空气耦合超声兰姆波成像检测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

靶向GRPR的铁蛋白笼状嵌合体多功能分子探针的构建及其PET/MRI/NIRF三模式显像的探索研究

国家自然科学基金

0+阅读 · 2011年12月31日

RGM与neogenin信号调控应激性精神障碍-PTSD杏仁核、海马神经细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员