超越词汇: 文本搜索引擎的语义检索框架 (Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine) - 专知论文

会员服务 ·

0

Engineering · Performer · MoDELS · 向量化 · FAST ·

2020 年 8 月 10 日

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

翻译：超越词汇: 文本搜索引擎的语义检索框架

Kuan Fang,Long Zhao,Zhan Shen,RuiXing Wang,RiKang Zhour,LiWen Fan

from arxiv, 9 pages

Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and document can be encoded as a low dimensional embedding. Our model was trained based on BERT architecture. We deployed a fast k-nearest-neighbor index service for online serving. Both offline and online metrics demonstrate that our method improved retrieval performance and search quality considerably, particularly for tail

翻译：搜索引擎已成为各种网络和移动应用程序的基本组成部分。从大型数据集中获取相关文件对搜索引擎系统来说是一项挑战性的工作,特别是当遇到verbose或尾部查询时。在本文件中,我们探索了一个用于文件检索的矢量空间搜索框架。具体地说,我们训练了一个深层次的语义匹配模型,以便每个查询和文件都能够以低维嵌入方式编码。我们的模型是建立在 BERT 结构基础上的培训。我们为在线服务安装了一个快速的 k- nearest- nearbearbor 索引服务。离线和在线测量都表明,我们的方法大大改进了检索性能和搜索质量,特别是对尾部而言。

16

相关内容

Engineering

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

专知会员服务

41+阅读 · 2019年12月15日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

PaperWeekly

8+阅读 · 2019年6月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

清华大学NLP组整理的机器翻译论文阅读清单

清华大学NLP组整理的机器翻译论文阅读清单

AINLP

5+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

A Capsule Network-based Embedding Model for Search Personalization

Arxiv

4+阅读 · 2019年3月6日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

Convolutional CRFs for Semantic Segmentation

Arxiv

8+阅读 · 2018年5月15日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Large-Scale Image Retrieval with Attentive Deep Local Features

Arxiv

3+阅读 · 2018年2月3日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)

Arxiv

7+阅读 · 2018年1月9日

Content based video retrieval

Arxiv

3+阅读 · 2012年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

专知会员服务

41+阅读 · 2019年12月15日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

每周一起读 × 招募 | ACL 2019：基于知识增强的语言表示模型

PaperWeekly

8+阅读 · 2019年6月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

清华大学NLP组整理的机器翻译论文阅读清单

清华大学NLP组整理的机器翻译论文阅读清单

AINLP

5+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

A Capsule Network-based Embedding Model for Search Personalization

Arxiv

4+阅读 · 2019年3月6日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

Convolutional CRFs for Semantic Segmentation

Arxiv

8+阅读 · 2018年5月15日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Large-Scale Image Retrieval with Attentive Deep Local Features

Arxiv

3+阅读 · 2018年2月3日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)

Arxiv

7+阅读 · 2018年1月9日

Content based video retrieval

Arxiv

3+阅读 · 2012年11月20日

微信扫码咨询专知VIP会员