OpenMatch: 用于信息检索的开放源码软件包 (OpenMatch: An Open-Source Package for Information Retrieval) - 专知论文

会员服务 ·

0

秩 · INFORMS · IR · 信息检索 · Extensibility ·

2021 年 1 月 30 日

OpenMatch: An Open-Source Package for Information Retrieval

翻译：OpenMatch: 用于信息检索的开放源码软件包

Zhenghao Liu,Kaitao Zhang,Chenyan Xiong,Zhiyuan Liu

from arxiv, 4 pages

Information Retrieval (IR) is an important task and can be used in many applications. Neural IR (Neu-IR) models overcome the vocabulary mismatch problem of sparse retrievers and thrive on the ranking pipeline with semantic matching. Recent progress in IR mainly focuses on Neu-IR models, including efficient dense retrieval, advanced neural architectures and robustly training for few-shot IR that lacks training data. In order to integrate these advantages for researchers and engineers to utilize and develop, OpenMatch provides various functional neural modules based on PyTorch to maintain sufficient extensibility, making it easy to build customized and higher-capacity IR systems. Besides, OpenMatch consists of complicated optimization tricks, various sparse/dense retrieval methods, and advanced few-shot training methods, liberating users from surplus labor in baseline reimplementation and neural model finetuning. With OpenMatch, we achieve reasonable performance on various ranking datasets, rank first of the automatic group in TREC COVID (Round 2) and rank top on the MS MARCO Document Ranking leaderboard. The library, experimental methodologies and results of OpenMatch are all publicly available at https://github.com/thunlp/OpenMatch.

翻译：信息检索(IR)是一项重要任务,可以在许多应用中使用。神经IR(Neu-IR)模型克服了稀有检索器的词汇错配问题,并且以语义匹配的方式在排位管道上蓬勃发展。IR最近的进展主要集中在Neu-IR模型上,包括高效的密集检索、先进的神经结构以及对缺乏培训数据的微粒IR进行强力培训。为了整合研究人员和工程师利用和发展的这些优势,OpenMatch提供了以PyTorch为基础的各种功能性神经模块,以保持足够的可扩展性,使其易于建立定制的和更高能力的IR系统。此外,OpenMatch包括复杂的优化技巧、各种稀有/重现的检索方法和先进的几发式培训方法,使用户在基线再实施和神经模型微调中摆脱多余的劳动力。OpenMatch利用OpenMatch(REC CO CO CO CO CO (Round 2) 自动组的排名第一,并在MS MARCO 文档排名领导板上排名最高。OnMatch/OnMissionMs/OnMs/Orms/Onsm) 都可公开查阅。

0

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Autoregressive Entity Retrieval

Arxiv

0+阅读 · 2021年3月24日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Arxiv

11+阅读 · 2021年1月28日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Arxiv

3+阅读 · 2019年10月17日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Arxiv

15+阅读 · 2019年5月28日

Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate

Arxiv

7+阅读 · 2018年4月24日

Training a Ranking Function for Open-Domain Question Answering

Arxiv

5+阅读 · 2018年4月12日

VIP会员

文章信息

相关主题

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Autoregressive Entity Retrieval

Arxiv

0+阅读 · 2021年3月24日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Arxiv

11+阅读 · 2021年1月28日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Arxiv

3+阅读 · 2019年10月17日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Arxiv

15+阅读 · 2019年5月28日

Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate

Arxiv

7+阅读 · 2018年4月24日

Training a Ranking Function for Open-Domain Question Answering

Arxiv

5+阅读 · 2018年4月12日

微信扫码咨询专知VIP会员