The Arabic Citation Index (ARCI) was launched in 2020. This study gives an overview of the scientific literature available in this new database. By using metadata available in scientific publications, I analyse ARCI to characterize the scientific literature published in Arabic. First, I describe the data and the methods used in the analyses. As of October 2020, ARCI indexed 65,208 records covering the 2015-2019 period. Second, I explore the literature distributions at various levels (research domains, countries, languages, open access). Close to 99% of documents indexed are articles. Results reveal the concentration of publications in the Arts & Humanities and Social Sciences fields. Most journals indexed in ARCI are currently published from Egypt, Algeria, Iraq, Jordan and Saudi Arabia. Around 7% of publications in ARCI are published in languages other than Arabic. Then, I use an unsupervised machine learning model, LDA (Latent Dirichlet Allocation) and the text mining algorithms of VOSviewer to uncover the main topics in ARCI. These methods are particularly useful to better understand the topical structure of ARCI. Finally, I suggest few research opportunities after discussing the results of this study.
翻译:阿拉伯文献索引(ARCI)于2020年推出。本研究报告概述了这一新数据库中现有的科学文献。我利用科学出版物中的现有元数据,分析了ARCI,以描述以阿拉伯文出版的科学文献。首先,我描述了数据和分析中使用的方法。截至2020年10月,阿拉伯文献索引为2015-2019年期间的65 208份记录编制索引。第二,我探讨了各级(研究领域、国家、语言、公开存取)的文献发行情况。近99%的编入索引的文件是文章。结果显示艺术和人文科学及社会科学领域出版物的集中情况。大多数编入ARCI的期刊目前都来自埃及、阿尔及利亚、伊拉克、约旦和沙特阿拉伯。大约7%的阿文献索引以阿拉伯文以外的其他语文出版。然后,我使用一个未经监督的机器学习模型(LATent Dirichlet Pat)和VOSvier的文字挖掘算法来揭示阿瑟中心的主要专题。这些方法对于更好地了解阿瑟中心的专题结构特别有用。最后,我建议,在讨论这项研究结果后,很少有研究机会。