Representation learning is the first step in automating tasks such as research paper recommendation, classification, and retrieval. Due to the accelerating rate of research publication, together with the recognised benefits of interdisciplinary research, systems that facilitate researchers in discovering and understanding relevant works from beyond their immediate school of knowledge are vital. This work explores different methods of research paper representation (or document embedding), to identify those methods that are capable of preserving the interdisciplinary implications of research papers in their embeddings. In addition to evaluating state of the art methods of document embedding in a interdisciplinary citation prediction task, we propose a novel Graph Neural Network architecture designed to preserve the key interdisciplinary implications of research articles in citation network node embeddings. Our proposed method outperforms other GNN-based methods in interdisciplinary citation prediction, without compromising overall citation prediction performance.
翻译:表征学习是自动化任务的第一步,例如研究论文推荐、分类和检索。由于研究出版物的加速增长速度,加上跨学科研究的认可好处,能够帮助研究人员发现和理解来自他们之外的知识范围的相关作品的系统至关重要。本文探讨了不同的研究论文表征方法(或文档嵌入),以确定能够保留研究论文交叉学科影响的方法。除了在交叉引文预测任务中评估文档嵌入的最新方法外,我们还提出了一种新的图神经网络架构,设计用于在引文网络节点嵌入中保留关键的交叉学科影响的研究论文。我们提出的方法在交叉引文预测中优于其他基于GNN的方法,而不会影响整体引文预测性能。