We propose a method for using a large language model, such as GPT-3, to simulate responses of different humans in a given context. We test our method by attempting to reproduce well-established economic, psycholinguistic, and social experiments. The method requires prompt templates for each experiment. Simulations are run by varying the (hypothetical) subject details, such as name, and analyzing the text generated by the language model. To validate our methodology, we use GPT-3 to simulate the Ultimatum Game, garden path sentences, risk aversion, and the Milgram Shock experiments. In order to address concerns of exposure to these studies in training data, we also evaluate simulations on novel variants of these studies. We show that it is possible to simulate responses of different people and that their responses are consistent with prior human studies from the literature. Across all studies, the distributions generated by larger language models better align with prior experimental results, suggesting a trend that future language models may be used for even more faithful simulations of human responses. Our use of a language model for simulation is contrasted with anthropomorphic views of a language model as having its own behavior.
翻译:我们提出一种方法,用于使用大型语言模型,如GPT-3,在特定情况下模拟不同人类的反应。我们通过尝试复制成熟的经济、精神语言和社会实验来测试我们的方法。该方法要求每项实验的快速模板。模拟用不同(假的)主题细节运行,如名称,并分析语言模型产生的文本。为了验证我们的方法,我们使用GPT-3来模拟超时通游戏、花园路径句子、风险反转和美格拉姆冲击实验。为了解决在培训数据中暴露于这些研究的担忧,我们还评估了这些研究的新变异的模拟。我们表明,模拟不同人的反应是可能的,他们的反应与文献中以前的人类研究一致。在所有研究中,较大的语言模型产生的分布与先前实验结果更加一致,表明未来语言模型可用于更忠实的模拟人类反应的趋势。我们使用语言模拟模型与对一种语言模型本身的行为进行对比。