Large Language Models are being used in conversational agents that simulate human conversations and generate social studies data. While concerns about the models' biases have been raised and discussed in the literature, much about the data generated is still unknown. In this study we explore the statistical representation of social values across four countries (UK, Argentina, USA and China) for six LLMs, with equal representation for open and closed weights. By comparing machine-generated outputs with actual human survey data, we assess whether algorithmic biases in LLMs outweigh the biases inherent in real- world sampling, including demographic and response biases. Our findings suggest that, despite the logistical and financial constraints of human surveys, even a small, skewed sample of real respondents may provide more reliable insights than synthetic data produced by LLMs. These results highlight the limitations of using AI-generated text for social research and emphasize the continued importance of empirical human data collection.
翻译:大型语言模型正被用于模拟人类对话并生成社会科学研究数据的对话代理中。尽管文献中已提出并讨论了关于模型偏见的担忧,但关于生成数据的许多方面仍属未知。本研究探讨了六种大型语言模型(开放权重与封闭权重模型各占一半)在四个国家(英国、阿根廷、美国和中国)社会价值观的统计表征。通过比较机器生成输出与实际人类调查数据,我们评估了大型语言模型中的算法偏见是否超过现实世界抽样(包括人口统计偏见和回答偏见)中固有的偏见。研究结果表明,尽管人类调查存在后勤和财务限制,但即使是少量有偏差的真实受访者样本,也可能比大型语言模型生成的合成数据提供更可靠的见解。这些结果凸显了使用人工智能生成文本进行社会科学研究的局限性,并强调了经验性人类数据收集的持续重要性。