Anomaly detection is one of the most active research areas in various critical domains, such as healthcare, fintech, and public security. However, little attention has been paid to scholarly data, i.e., anomaly detection in a citation network. Citation is considered as one of the most crucial metrics to evaluate the impact of scientific research, which may be gamed in multiple ways. Therefore, anomaly detection in citation networks is of significant importance to identify manipulation and inflation of citations. To address this open issue, we propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks. GLAD incorporates text semantic mining to network representation learning by adding both node attributes and link attributes via graph neural networks. It exploits not only the relevance of citation contents but also hidden relationships between papers. Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts. The performance of GLAD is validated through a simulated anomalous citation dataset. Experimental results demonstrate the effectiveness of GLAD on the anomalous citation detection task.
翻译:异常探测是保健、芬特和公共安全等各种关键领域最活跃的研究领域之一,但很少注意学术数据,即引用网络中的异常点探测。引用被视为评价科学研究影响的最重要衡量标准之一,而科学研究可能以多种方式游戏。因此,引用网络中的异常点探测对于查明引文的操纵和通货膨胀非常重要。为了解决这一公开问题,我们提议了一个新的深层次图表学习模型,即GLAD(异常探测的格子学习),以查明引用网络中的异常点。GLAD将文字语义挖掘纳入网络代表性学习,方法是通过图形神经网络添加节点属性和链接属性。它不仅利用引用内容的相关性,而且利用文件之间的隐藏关系。在GLAD框架内,我们建议使用一种称为CPU(Citation PUrpose)的算法,以发现引用文本的用途。GLAD的性能通过模拟反常态引用数据探测任务验证。GLAD的实验结果显示GAD的性能。