Citation content analysis seeks to understand citations based on the language used during the making of a citation. A key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of citations for the purposes of understanding the intent and function of a citation. Previous works have focused on modeling linguistic features first and drawn conclusions on the language structures unique to each class of citation function based on the performance of a classification task or inter-annotator agreement. In this study, we start with a large sample of a pre-classified citation corpus, 2 million citations from each class of the scite Smart Citation dataset (supporting, disputing, and mentioning citations), and analyze its corpus linguistics in order to reveal the unique and statistically significant language structures belonging to each type of citation. By generating comparison tables for each citation type we present a number of interesting linguistic features that uniquely characterize citation type. What we find is that within citation collocates, there is very low correlation between citation type and sentiment. Additionally, we find that the subjectivity of citation collocates across classes is very low. These findings suggest that the sentiment of collocates is not a predictor of citation function and that due to their low subjectivity, an opinion-expressing mode of understanding citations, implicit in previous citation sentiment analysis literature, is inappropriate. Instead, we suggest that citations can be better understood as claims-making devices where the citation type can be explained by understanding how two claims are being compared. By presenting this approach, we hope to inspire similar corpus linguistic studies on citations that derive a more robust theory of citation from an empirical basis using citation corpora
翻译:引用内容分析的关键问题是,为了理解引用的意图和功能,要寻找不同类别的不同引用语言结构特征。 先前的工作重点是先模拟语言特征,然后根据分类任务或相互招标协议的执行情况,就每一类引用功能所特有的语言结构得出结论。 在这项研究中,我们首先从大量预分类的引用资料样本开始,每类引用智能引用数据集(支持、争议和提及引用)引用200万条,并分析其内容语言结构,以揭示属于每种引用的独特和具有统计意义的语言结构。 通过为每种引用类型制作比较表,我们展示了一些独特的引用类型。我们发现,在引言中,引言类型与情绪之间的相关性非常低,此外,我们发现跨类引用智能引用数据集的主题性(支持、争议和提及引用)不同,并分析其内容,以揭示属于每种引用的独特和具有统计意义的语言结构结构。 这些结论表明,通过对每种引用类型进行比较,我们展示一些有趣的语言特征,在引言中,在引用类型与感知性之间有非常低的关系。 我们发现,跨类引用的调的理论的主旨是如何低,在解读中,我们理解一种不甚深层次的判断,在前的解读中可以理解,我们理解一种不甚深层次的判断,在前的判断中可以理解,这种感判。