In this article, we present a methodology which takes as input a collection of retracted articles, gathers the entities citing them, characterizes such entities according to multiple dimensions (disciplines, year of publication, sentiment, etc.), and applies a quantitative and qualitative analysis on the collected values. The methodology is composed of four phases: (1) identifying, retrieving, and extracting basic metadata of the entities which have cited a retracted article, (2) extracting and labeling additional features based on the textual content of the citing entities, (3) building a descriptive statistical summary based on the collected data, and finally (4) running a topic modeling analysis. The goal of the methodology is to generate data and visualizations that help understanding possible behaviors related to retraction cases. We present the methodology in a structured step-by-step form following its four phases, discuss its limits and possible workarounds, and list the planned future improvements.
翻译:在本条中,我们提出一种方法,将一些被撤回的文章汇编成册,收集被引用的文章,收集被引用的实体,根据多个方面(纪律、出版年份、情绪等)对这些实体进行定性,并对所收集的数值进行定量和定性分析,方法由四个阶段组成:(1) 查明、检索和提取援引被撤回文章的实体的基本元数据,(2) 根据被引用实体的文字内容提取和标注额外的特征,(3) 根据所收集的数据建立描述性统计摘要,以及(4) 进行专题模型分析,目的是生成数据和可视化数据,帮助了解与撤回案件相关的可能行为,我们在四个阶段之后以分阶段的形式介绍该方法,讨论其局限性和可能的变通办法,并列出计划的未来改进办法。