大型语言模型在模拟用户调查响应中的分析 (An Analysis of Large Language Models for Simulating User Responses in Surveys)

Using Large Language Models (LLMs) to simulate user opinions has received growing attention. Yet LLMs, especially trained with reinforcement learning from human feedback (RLHF), are known to exhibit biases toward dominant viewpoints, raising concerns about their ability to represent users from diverse demographic and cultural backgrounds. In this work, we examine the extent to which LLMs can simulate human responses to cross-domain survey questions through direct prompting and chain-of-thought prompting. We further propose a claim diversification method CLAIMSIM, which elicits viewpoints from LLM parametric knowledge as contextual input. Experiments on the survey question answering task indicate that, while CLAIMSIM produces more diverse responses, both approaches struggle to accurately simulate users. Further analysis reveals two key limitations: (1) LLMs tend to maintain fixed viewpoints across varying demographic features, and generate single-perspective claims; and (2) when presented with conflicting claims, LLMs struggle to reason over nuanced differences among demographic features, limiting their ability to adapt responses to specific user profiles.

翻译：利用大型语言模型（LLMs）模拟用户观点已受到日益增长的关注。然而，LLMs，特别是通过人类反馈强化学习（RLHF）训练的模型，已知会表现出对主流观点的偏向，这引发了对其能否代表来自不同人口统计和文化背景用户的担忧。在本研究中，我们通过直接提示和思维链提示，考察了LLMs在多大程度上能够模拟人类对跨领域调查问题的响应。我们进一步提出了一种观点多样化方法CLAIMSIM，该方法从LLM的参数知识中提取观点作为上下文输入。在调查问题回答任务上的实验表明，尽管CLAIMSIM能产生更多样化的响应，但两种方法都难以准确模拟用户。进一步的分析揭示了两个关键局限性：（1）LLMs倾向于在不同人口统计特征间保持固定观点，并生成单一视角的论断；（2）当面对相互冲突的论断时，LLMs难以对人口统计特征间的细微差异进行推理，从而限制了其针对特定用户画像调整响应的能力。