纵向搜索具体领域培训:生物医学文献案例研究 (Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature) - 专知论文

会员服务 ·

0

CASE · 有向 · INFORMS · Performer · Better ·

2021 年 9 月 16 日

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

翻译：纵向搜索具体领域培训:生物医学文献案例研究

Yu Wang,Jinchao Li,Tristan Naumann,Chenyan Xiong,Hao Cheng,Robert Tinn,Cliff Wong,Naoto Usuyama,Richard Rogahn,Zhihong Shen,Yang Qin,Eric Horvitz,Paul N. Bennett,Jianfeng Gao,Hoifung Poon

from arxiv, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2021 Applied Data Science Track

Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.

翻译：在许多高价值领域,信息超载是一个普遍的挑战。一个突出的例子是COVID-19生物医学文献的爆炸,在几个月的时间里上升到数十万篇论文。一般而言,生物医学文献每分钟增加两份论文,每年共100万份新论文。在生物医学领域和许多其他纵向领域,由于缺少点击日志的直接监督,搜索是一项挑战。自我监督的学习已成为克服注释瓶颈的一个有希望的方向。我们提出了一个基于特定领域培训前的纵向搜索的一般方法,并提出了生物医学领域的案例研究。尽管我们的方法比官方的TREC-COVID评估(COVID相关生物医学搜索竞赛)中的最佳系统要简单得多,甚至更好。利用现代云层基础设施的分布式计算,我们的系统可以达到数千万篇关于普布麦德的文章,并被作为微软生物医学搜索,这是生物医学文献的新搜索经验:https://akas/biomeds。

0

相关内容

CASE

图学习怎么用？斯坦福《图深度学习》论坛，Jure Leskovec等众大牛讲述图学习以及在金融、医学及自然语言处理等领域应用

专知会员服务

49+阅读 · 2021年9月20日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【南洋理工】区块链综述，25页pdf，Blockchain for Future Smart Grid: A Comprehensive Survey

【南洋理工】区块链综述，25页pdf，Blockchain for Future Smart Grid: A Comprehensive Survey

专知会员服务

38+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Contrastive Reinforcement Learning of Symbolic Reasoning Domains

Arxiv

1+阅读 · 2021年11月8日

Defining Gaze Patterns for Process Model Literacy -- Exploring Visual Routines in Process Models with Diverse Mappings

Defining Gaze Patterns for Process Model Literacy -- Exploring Visual Routines in Process Models with Diverse Mappings

Arxiv

0+阅读 · 2021年11月4日

Pre-trained Language Models in Biomedical Domain: A Systematic Survey

Arxiv

10+阅读 · 2021年10月12日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

Fine-grained Angular Contrastive Learning with Coarse Labels

Arxiv

9+阅读 · 2020年12月7日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Arxiv

3+阅读 · 2018年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

图学习怎么用？斯坦福《图深度学习》论坛，Jure Leskovec等众大牛讲述图学习以及在金融、医学及自然语言处理等领域应用

专知会员服务

49+阅读 · 2021年9月20日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【南洋理工】区块链综述，25页pdf，Blockchain for Future Smart Grid: A Comprehensive Survey

【南洋理工】区块链综述，25页pdf，Blockchain for Future Smart Grid: A Comprehensive Survey

专知会员服务

38+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Contrastive Reinforcement Learning of Symbolic Reasoning Domains

Arxiv

1+阅读 · 2021年11月8日

Defining Gaze Patterns for Process Model Literacy -- Exploring Visual Routines in Process Models with Diverse Mappings

Defining Gaze Patterns for Process Model Literacy -- Exploring Visual Routines in Process Models with Diverse Mappings

Arxiv

0+阅读 · 2021年11月4日

Pre-trained Language Models in Biomedical Domain: A Systematic Survey

Arxiv

10+阅读 · 2021年10月12日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

Fine-grained Angular Contrastive Learning with Coarse Labels

Arxiv

9+阅读 · 2020年12月7日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Arxiv

3+阅读 · 2018年3月17日

微信扫码咨询专知VIP会员