ChatGPT (GPT) has become one of the most talked-about innovations in recent years, with over 100 million users worldwide. However, there is still limited knowledge about the sources of information GPT utilizes. As a result, we carried out a study focusing on the sources of information within the field of environmental science. In our study, we asked GPT to identify the ten most significant subdisciplines within the field of environmental science. We then asked it to compose a scientific review article on each subdiscipline, including 25 references. We proceeded to analyze these references, focusing on factors such as the number of citations, publication date, and the journal in which the work was published. Our findings indicate that GPT tends to cite highly-cited publications in environmental science, with a median citation count of 1184.5. It also exhibits a preference for older publications, with a median publication year of 2010, and predominantly refers to well-respected journals in the field, with Nature being the most cited journal by GPT. Interestingly, our findings suggest that GPT seems to exclusively rely on citation count data from Google Scholar for the works it cites, rather than utilizing citation information from other scientific databases such as Web of Science or Scopus. In conclusion, our study suggests that Google Scholar citations play a significant role as a predictor for mentioning a study in GPT-generated content. This finding reinforces the dominance of Google Scholar among scientific databases and perpetuates the Matthew Effect in science, where the rich get richer in terms of citations. With many scholars already utilizing GPT for literature review purposes, we can anticipate further disparities and an expanding gap between lesser-cited and highly-cited publications.
翻译:ChatGPT(GPT)已成为最近几年最受关注的创新之一,全球有超过1亿用户。然而,我们对GPT使用的信息来源仍知之甚少。因此,我们进行了一项研究,重点关注环境科学领域内的信息来源。在我们的研究中,我们要求GPT识别环境科学领域内最重要的十个子学科。然后,我们要求它撰写关于每个子学科的科学综述文章,每篇文章包括25个参考文献。接着,我们对这些文献进行了分析,关注因素如引用次数、出版日期以及文章发表的期刊。我们的研究结果表明,GPT倾向于引用环境科学中引用次数较高的文章,其中引用计数中位数为1184.5。它还喜欢老一点的出版物,出版年份的中位数为2010年,并且主要引用该领域中备受尊重的期刊,其中Nature是GPT引用最多的期刊。有趣的是,我们的研究结果表明,GPT似乎完全依赖Google Scholar的引用计数数据来引用其提及的工作,而不是利用其他科学数据库(如Web of Science或Scopus)的引用信息。总之,我们的研究表明,Google Scholar引用在预测GPT生成内容中提到研究方面发挥了重要作用。这一发现加强了Google Scholar在科学数据库中的主导地位,并延续了科学中的Matthew效应(富者更富的现象),即在引用方面得到更多引用的文章将会更加引人关注。随着许多学者已经利用GPT进行文献综述,我们可以预期会出现进一步的不公和较低引用次数和高引用次数之间的差距不断扩大的情况。