查询2doc: 使用大语言模式的查询扩展</s> (Query2doc: Query Expansion with Large Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · INFORMS · state-of-the-art · Performer ·

2023 年 3 月 14 日

Query2doc: Query Expansion with Large Language Models

翻译：查询2doc: 使用大语言模式的查询扩展

Liang Wang,Nan Yang,Furu Wei

from arxiv, 9 pages

This paper introduces a simple yet effective query expansion approach, denoted as query2doc, to improve both sparse and dense retrieval systems. The proposed method first generates pseudo-documents by few-shot prompting large language models (LLMs), and then expands the query with generated pseudo-documents. LLMs are trained on web-scale text corpora and are adept at knowledge memorization. The pseudo-documents from LLMs often contain highly relevant information that can aid in query disambiguation and guide the retrievers. Experimental results demonstrate that query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets, such as MS-MARCO and TREC DL, without any model fine-tuning. Furthermore, our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.

翻译：本文介绍了一种简单而有效的查询扩展方法,称为查询2doc,目的是改进稀少和密集的检索系统。提议的方法首先通过几发发促动大型语言模型(LLMS)生成假文件,然后用生成的伪文件扩大查询范围。LMS在网络规模的文本公司中接受培训,并善于知识记忆化。LMS的伪文件通常包含高度相关的信息,有助于查询脱节并引导检索者。实验结果显示,查询2doc在诸如MS-MARCO和TREC DL等临时IR数据集中将BM25的性能提高3%至15%,而没有任何模型微调。此外,我们的方法还有利于最新的高密度检索器,无论是在内部还是外部结果方面都是如此。</s>

0

相关内容

语言模型化

语言模型化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

潘多拉菌中氯苯代谢的两个基因簇的转录调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erk调控Wnt通路影响间充质干细胞分化的机制及其在骨质疏松中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

炎症调节因子c-REL抑制TAp73的抗骨肉瘤作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

非小细胞肺癌OPN对TAM的功能调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

骨髓间充质干细胞成骨功能异常在骨髓增生异常综合征发病机制中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

高同型半胱氨酸血症FABP4 DNA甲基化作用及特异性miRNAs调控机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Arxiv

0+阅读 · 2023年5月4日

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Arxiv

0+阅读 · 2023年5月4日

Zero-Shot Listwise Document Reranking with a Large Language Model

Arxiv

0+阅读 · 2023年5月3日

Understanding Differential Search Index for Text Retrieval

Arxiv

0+阅读 · 2023年5月3日

Can Large Language Models Be an Alternative to Human Evaluations?

Arxiv

0+阅读 · 2023年5月3日

Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks

Arxiv

0+阅读 · 2023年5月3日

Few-shot In-context Learning for Knowledge Base Question Answering

Arxiv

0+阅读 · 2023年5月2日

Complex Logical Reasoning over Knowledge Graphs using Large Language Models

Arxiv

0+阅读 · 2023年5月2日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Arxiv

11+阅读 · 2018年2月16日

VIP会员

文章信息

相关主题

语言模型化

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Arxiv

0+阅读 · 2023年5月4日

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Arxiv

0+阅读 · 2023年5月4日

Zero-Shot Listwise Document Reranking with a Large Language Model

Arxiv

0+阅读 · 2023年5月3日

Understanding Differential Search Index for Text Retrieval

Arxiv

0+阅读 · 2023年5月3日

Can Large Language Models Be an Alternative to Human Evaluations?

Arxiv

0+阅读 · 2023年5月3日

Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks

Arxiv

0+阅读 · 2023年5月3日

Few-shot In-context Learning for Knowledge Base Question Answering

Arxiv

0+阅读 · 2023年5月2日

Complex Logical Reasoning over Knowledge Graphs using Large Language Models

Arxiv

0+阅读 · 2023年5月2日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Arxiv

11+阅读 · 2018年2月16日

相关基金

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

潘多拉菌中氯苯代谢的两个基因簇的转录调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erk调控Wnt通路影响间充质干细胞分化的机制及其在骨质疏松中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

炎症调节因子c-REL抑制TAp73的抗骨肉瘤作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

非小细胞肺癌OPN对TAM的功能调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

骨髓间充质干细胞成骨功能异常在骨髓增生异常综合征发病机制中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

高同型半胱氨酸血症FABP4 DNA甲基化作用及特异性miRNAs调控机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员