Large Language Models (LLMs) are increasingly used to simulate population responses, a method known as ``Silicon Sampling''. However, responses to socially sensitive questions frequently exhibit Social Desirability Bias (SDB), diverging from real human data toward socially acceptable answers. Existing studies on social desirability bias in LLM-based sampling remain limited. In this work, we investigate whether minimal, psychologically grounded prompt wording can mitigate this bias and improve alignment between silicon and human samples. We conducted a study using data from the American National Election Study (ANES) on three LLMs from two model families: the open-source Llama-3.1 series and GPT-4.1-mini. We first replicate a baseline silicon sampling study, confirming the persistent Social Desirability Bias. We then test four prompt-based mitigation methods: \emph{reformulated} (neutral, third-person phrasing), \emph{reverse-coded} (semantic inversion), and two meta-instructions, \emph{priming} and \emph{preamble}, respectively encouraging analytics and sincerity. Alignment with ANES is evaluated using Jensen-Shannon Divergence with bootstrap confidence intervals. Our results demonstrate that reformulated prompts most effectively improve alignment by reducing distribution concentration on socially acceptable answers and achieving distributions closer to ANES. Reverse-coding produced mixed results across eligible items, while the Priming and Preamble encouraged response uniformity and showed no systematic benefit for bias mitigation. Our findings validate the efficacy of prompt-based framing controls in mitigating inherent Social Desirability Bias in LLMs, providing a practical path toward more representative silicon samples.
翻译:大型语言模型(LLMs)越来越多地被用于模拟群体反应,这种方法被称为“硅采样”。然而,针对社会敏感性问题的回答常常表现出社会期望偏差(SDB),即偏离真实人类数据而趋向于社会可接受的答案。目前关于基于LLM的采样中社会期望偏差的研究仍然有限。在本工作中,我们研究了基于心理学原理的、极简的提示措辞是否能缓解这种偏差,并改善硅样本与人类样本之间的一致性。我们利用美国国家选举研究(ANES)的数据,对来自两个模型系列的三个LLM进行了研究:开源的Llama-3.1系列和GPT-4.1-mini。我们首先复现了一项基线硅采样研究,确认了持续存在的社会期望偏差。然后,我们测试了四种基于提示的缓解方法:\emph{重新表述}(中性的、第三人称措辞)、\emph{反向编码}(语义反转),以及两种元指令,即分别鼓励分析性和真诚性的\emph{启动}和\emph{序言}。与ANES的一致性使用带自助法置信区间的Jensen-Shannon散度进行评估。我们的结果表明,重新表述的提示通过减少对社会可接受答案的分布集中度,并实现更接近ANES的分布,最有效地改善了一致性。反向编码在符合条件的项目上产生了混合的结果,而启动和序言则鼓励了回答的均匀性,并未显示出对缓解偏差的系统性益处。我们的研究结果验证了基于提示的框架控制在缓解LLM中固有的社会期望偏差方面的有效性,为获得更具代表性的硅样本提供了一条实用路径。