ORCID is a scientific infrastructure created to solve the problem of author name ambiguity. Over the years ORCID has also become a useful source for studying academic activities reported by researchers. Our objective in this research was to use ORCID to analyze one of these research activities: the publication of datasets. We illustrate how the identification of datasets that shared in researchers' ORCID profiles enables the study of the characteristics of the researchers who have produced them. To explore the relevance of ORCID to study data sharing practices we obtained all ORCID profiles reporting at least one dataset in their "works" list, together with information related to the individual researchers producing the datasets. The retrieved data was organized and analyzed in a SQL database hosted at CWTS. Our results indicate that DataCite is by far the most important data source for providing information about datasets recorded in ORCID. There is also a substantial overlap between DataCite records with other repositories (Figshare, Dryad, and Zenodo). The analysis of the distribution of researchers producing datasets shows that the top six countries with more data producers, also have a relatively higher percentage of people who have produced datasets out of total researchers with datasets than researchers in the total ORCID. By disciplines, researchers that belong to the areas of Natural Sciences and Medicine and Life Sciences are those with the largest amount of reported datasets. Finally, we observed that researchers who have started their PhD around 2015 published their first dataset earlier that those researchers that started their PhD before. The work concludes with some reflections of the possibilities of ORCID as a relevant source for research on data sharing practices.
翻译:ORCID是为解决作者姓名模糊问题而创建的一个科学基础设施。多年来,ORCID也成为研究研究人员所报告的学术活动的有用来源。我们这项研究的目标是利用ORCID分析其中一项研究活动:数据集的出版。我们说明如何通过识别在研究人员ORCID简介中共享的数据集,研究制作这些数据集的研究人员的特征。探索ORCID是否与研究数据分享做法的相关性,我们获得的所有ORCID在“工作”清单中至少报告的一个数据集,连同与制作数据集的个人研究人员有关的信息。检索的数据是在CWTS的SQL数据库中组织和分析的。我们的结果表明,数据Cite是最重要的数据来源,以提供在ORCIDCID档案中记录的数据集信息。数据记录与其他储存库(Figshare、Dryad和Zenodo)有很大的重叠。对编制数据集的研究人员的分布分析表明,拥有更多数据编制数据集的六大国家也开始在SQLLL数据库中进行相对较高的比例。我们发现,在研究者中,其生命科学领域中,其研究领域的数据最终由研究人员进行数据,这些研究人员进行有关数据,这些研究人员在2015年的统计中,这些研究领域开始与研究领域进行有关。