心理学研究中的人工智能 (Artificial Intelligence in Psychology Research)

from arxiv, 28 pages, 2 visualizations (1 table and 1 figure), preregistered OSF database is available at https://osf.io/dzp8t/?view_only=45fff3953884443d81b628cdd5d50f7a

Large Language Models have vastly grown in capabilities. One potential application of such AI systems is to support data collection in the social sciences, where perfect experimental control is currently unfeasible and the collection of large, representative datasets is generally expensive. In this paper, we re-replicate 14 studies from the Many Labs 2 replication project (Klein et al., 2018) with OpenAI's text-davinci-003 model, colloquially known as GPT3.5. For the 10 studies that we could analyse, we collected a total of 10,136 responses, each of which was obtained by running GPT3.5 with the corresponding study's survey inputted as text. We find that our GPT3.5-based sample replicates 30% of the original results as well as 30% of the Many Labs 2 results, although there is heterogeneity in both these numbers (as we replicate some original findings that Many Labs 2 did not and vice versa). We also find that unlike the corresponding human subjects, GPT3.5 answered some survey questions with extreme homogeneity$\unicode{x2013}$with zero variation in different runs' responses$\unicode{x2013}$raising concerns that a hypothetical AI-led future may in certain ways be subject to a diminished diversity of thought. Overall, while our results suggest that Large Language Model psychology studies are feasible, their findings should not be assumed to straightforwardly generalise to the human case. Nevertheless, AI-based data collection may eventually become a viable and economically relevant method in the empirical social sciences, making the understanding of its capabilities and applications central.

翻译：大型语言模型在能力方面有了巨大的发展。这种AI系统的一个潜在应用是支持社会科学的数据收集工作,社会科学目前无法进行完美的实验控制,收集大量具有代表性的大型数据集的费用一般是昂贵的。在本文中,我们重复了许多实验室2复制项目的14项研究(Klein等人,2018年),使用OpenAI的文本-davinci-003模型(俗称GPT3.5),对10项研究进行了共10,136项答复,其中每一项答复都是通过运行GPT3.5的直截了当的研究应用文本输入的。我们发现,我们基于GPT3.5的样本复制了30%的原始结果以及许多实验室2复制了30%的结果(Klein等人,2018年),同时我们复制了OpenAI的文本-davinci-003模型的一些原始发现,而许多实验室没有这样做,反之。我们还发现,与假设的人类主题不同,GPTT3.5回答了一些调查问题,但GPTGPTGNC$(NUnicioncodecode),2013_2013年美元,最终将相应的科学研究数据转换成一个假设性分析结果。在不同的分析研究中,一个假设性分析结果中可能减少。