论文研究文件的相似性 (Aspect-based Document Similarity for Research Papers) - 专知论文

会员服务 ·

0

相似度 · Performer · 相似度度量 · 成对型 · INFORMS ·

2020 年 10 月 13 日

Aspect-based Document Similarity for Research Papers

翻译：论文研究文件的相似性

Malte Ostendorff,Terry Ruas,Till Blume,Bela Gipp,Georg Rehm

from arxiv, Accepted for publication at COLING 2020

Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.

翻译：传统文件相似性措施为类似和不同文件提供了粗略的区别。一般情况下,它们不考虑两个文件的相似性。这限制了依赖文件相似性的建议系统等应用程序的颗粒性。在本文中,我们通过执行一个对称文件分类任务,将方面信息与方面信息加以扩展;我们评估研究论文的基于侧面的文件相似性。文件引用表明基于方面相似性,即引用的章节标题作为一对引证和引用的纸张的标签。我们应用了一系列变异性模型,如RoBERTA、ELECTRA、XLNet和BERT的变异性,并将其与LSTM基线进行比较。我们实验了两个新建的172,073个研究数据集,分别来自ACL Anthlogy和CORD-19系统。我们的结果显示SciBERT是最佳的系统。质量检查证实了我们的定量结果。我们的调查结果鼓励今后对基于方文件的相似性进行研究,并开发以评价技术为基础的建议系统。我们进行了数据设置、经过培训的代码和公开。

0

相关内容

相似度

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

专知会员服务

91+阅读 · 2020年7月23日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Arxiv

16+阅读 · 2020年8月10日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Keyphrase Generation for Scientific Articles using GANs

Keyphrase Generation for Scientific Articles using GANs

Arxiv

8+阅读 · 2019年9月24日

Enriching BERT with Knowledge Graph Embeddings for Document Classification

Arxiv

6+阅读 · 2019年9月18日

Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings

Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings

Arxiv

5+阅读 · 2019年9月11日

Collaborative Similarity Embedding for Recommender Systems

Arxiv

8+阅读 · 2019年2月19日

Attend More Times for Image Captioning

Attend More Times for Image Captioning

Arxiv

6+阅读 · 2018年12月8日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

Arxiv

3+阅读 · 2018年4月23日

VIP会员

文章信息

相关主题

相似度度量

相关VIP内容

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

【SIGIR 2020】基于协同注意力机制的知识增强推荐模型

专知会员服务

91+阅读 · 2020年7月23日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

相关论文

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Arxiv

16+阅读 · 2020年8月10日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Keyphrase Generation for Scientific Articles using GANs

Keyphrase Generation for Scientific Articles using GANs

Arxiv

8+阅读 · 2019年9月24日

Enriching BERT with Knowledge Graph Embeddings for Document Classification

Arxiv

6+阅读 · 2019年9月18日

Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings

Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings

Arxiv

5+阅读 · 2019年9月11日

Collaborative Similarity Embedding for Recommender Systems

Arxiv

8+阅读 · 2019年2月19日

Attend More Times for Image Captioning

Attend More Times for Image Captioning

Arxiv

6+阅读 · 2018年12月8日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

Arxiv

3+阅读 · 2018年4月23日

微信扫码咨询专知VIP会员