LG4AV:将语言模型和图表神经网络相结合,供作者核查 (LG4AV: Combining Language Models and Graph Neural Networks for Author Verification)

The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.

翻译：文件作者的自动核查在各种环境中都很重要,例如,研究人员根据其出版物和公共人物的数量和影响来判断和比较其出版物和公共人物在社交媒体平台上的职位,因此,在常用的网络服务和平台上撰写信息必须正确。一个特定文件是否由特定作者撰写的问题通常被称为作者核查。虽然AV是一个广泛调查的问题,但一般来说,只有很少的作品考虑文件简短和以相当统一的方式撰写的文件的设置。这使得大多数方法对在线数据库和知识图在学术领域不切合实际。在这里,科学出版物的作者必须经过核实,往往只有简单的摘要和标题。至此,我们介绍我们的新颖的LG4AV方法,它将语言模型和图表神经网络结合起来,供作者核查。通过直接将现有文本纳入经过事先训练的变异结构,我们的模型并不需要任何手工制作的体格特征,这些特征在书写风格至少在某种程度上是标准化的情景中并不有意义的。通过将科学出版物的作者的作者的书写方式纳入一个有实际价值的网络结构,我们可能通过将一个有意义的实验性网络的校程的校正的校正的校正,从而作者们的校正的校正的校会的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正环境,使得更有利于。