配有精细科学文件类似性教科书指导的多变量模型 (Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity) - 专知论文

会员服务 ·

0

相似度 · Guidance · 讲稿 · MoDELS · FAST ·

2022 年 5 月 4 日

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

翻译：配有精细科学文件类似性教科书指导的多变量模型

Sheshera Mysore,Arman Cohan,Tom Hope

from arxiv, NAACL 2022 camera-ready

We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple papers together (co-citations). Such co-citations not only reflect close paper relatedness, but also provide textual descriptions of how the co-cited papers are related. This novel form of textual supervision is used for learning to match aspects across papers. We develop multi-vector representations where vectors correspond to sentence-level aspects of documents, and present two methods for aspect matching: (1) A fast method that only matches single aspects, and (2) a method that makes sparse multiple matches with an Optimal Transport mechanism that computes an Earth Mover's Distance between aspects. Our approach improves performance on document similarity tasks in four datasets. Further, our fast single-match method achieves competitive results, paving the way for applying fine-grained similarity to large scientific corpora. Code, data, and models available at: https://github.com/allenai/aspire

翻译：我们提出一个新的科学文件相似性模型,其依据是文本的细细比方面。为了培训我们的模型,我们利用一种自然产生的监督来源:(1) 一种只与单一方面匹配的快速方法,(2) 一种与最优运输机制匹配的方法,即计算地球移动者之间的距离。我们的方法改进了四个数据集中类似文件任务的业绩。此外,我们的快速单相配方法取得了竞争性结果,为将微小相似性应用于大型科学公司铺平了道路。代码、数据和模型见:https://github.com/allenai/aspire。

0

相关内容

相似度

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Ago2磷酸化在DNA损伤修复中的功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

复合材料板瞬态热-力耦合弯曲行为分析的高阶多尺度方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于高维特征和稀疏子空间聚类的图像分割方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向人脸检测的大规模异构并行Adaboost机器学习算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

无穷维随机动力系统的SRB测度

国家自然科学基金

0+阅读 · 2013年12月31日

纳米磁性过渡金属碳化物的可控化学合成、结构及磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类两个分支的Camassa-Holm系统的弱解问题

国家自然科学基金

0+阅读 · 2012年12月31日

深海真菌抗植物病原真菌菌株的筛选及其活性物质研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型多层功能化碳纳米管薄膜的制备方法及其机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

Evaluation of Embedding Models for Automatic Extraction and Classification of Acknowledged Entities in Scientific Documents

Arxiv

0+阅读 · 2022年6月22日

SynWMD: Syntax-aware Word Mover's Distance for Sentence Similarity Evaluation

Arxiv

0+阅读 · 2022年6月20日

A Unified Understanding of Deep NLP Models for Text Classification

Arxiv

0+阅读 · 2022年6月19日

Learning Multiscale Transformer Models for Sequence Generation

Arxiv

0+阅读 · 2022年6月19日

Using Transfer Learning for Code-Related Tasks

Arxiv

0+阅读 · 2022年6月17日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

A Structured Self-attentive Sentence Embedding

Arxiv

24+阅读 · 2017年3月9日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Evaluation of Embedding Models for Automatic Extraction and Classification of Acknowledged Entities in Scientific Documents

Arxiv

0+阅读 · 2022年6月22日

SynWMD: Syntax-aware Word Mover's Distance for Sentence Similarity Evaluation

Arxiv

0+阅读 · 2022年6月20日

A Unified Understanding of Deep NLP Models for Text Classification

Arxiv

0+阅读 · 2022年6月19日

Learning Multiscale Transformer Models for Sequence Generation

Arxiv

0+阅读 · 2022年6月19日

Using Transfer Learning for Code-Related Tasks

Arxiv

0+阅读 · 2022年6月17日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

A Structured Self-attentive Sentence Embedding

Arxiv

24+阅读 · 2017年3月9日

相关基金

Ago2磷酸化在DNA损伤修复中的功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

复合材料板瞬态热-力耦合弯曲行为分析的高阶多尺度方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于高维特征和稀疏子空间聚类的图像分割方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向人脸检测的大规模异构并行Adaboost机器学习算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

无穷维随机动力系统的SRB测度

国家自然科学基金

0+阅读 · 2013年12月31日

纳米磁性过渡金属碳化物的可控化学合成、结构及磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类两个分支的Camassa-Holm系统的弱解问题

国家自然科学基金

0+阅读 · 2012年12月31日

深海真菌抗植物病原真菌菌株的筛选及其活性物质研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型多层功能化碳纳米管薄膜的制备方法及其机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员