Wikipedia is one of the most visited websites in the world and is also a frequent subject of scientific research. However, the analytical possibilities of Wikipedia information have not yet been analyzed considering at the same time both a large volume of pages and attributes. The main objective of this work is to offer a methodological framework and an open knowledge graph for the informetric large-scale study of Wikipedia. Features of Wikipedia pages are compared with those of scientific publications to highlight the (di)similarities between the two types of documents. Based on this comparison, different analytical possibilities that Wikipedia and its various data sources offer are explored, ultimately offering a set of metrics meant to study Wikipedia from different analytical dimensions. In parallel, a complete dedicated dataset of the English Wikipedia was built (and shared) following a relational model. Finally, a descriptive case study is carried out on the English Wikipedia dataset to illustrate the analytical potential of the knowledge graph and its metrics.
翻译:维基百科是全世界访问量最多的网站之一,也是科学研究的常见主题。然而,对维基百科信息的分析可能性尚未进行分析,同时考虑大量页数和属性。这项工作的主要目的是为维基百科大规模不计其数的研究提供一个方法框架和一个开放的知识图表。维基百科网页的特征与科学出版物的特征进行了比较,以突出两种类型的文件之间的(不同)差异。基于这一比较,探讨了维基百科及其各种数据来源提供的不同分析可能性,最终提供了一套旨在从不同分析层面研究维基百科的计量标准。与此同时,英文维基百科的完整专用数据集按照一个关系模型建立(并共享 ) 。 最后,对英文维基百科数据集进行了描述性案例研究,以说明知识图表及其指标的分析潜力。