The advanced reasoning capabilities of Large Reasoning Models enable them to thoroughly understand and apply safety policies through deliberate thought processes, thereby improving the models' safety. Beyond safety, these models must also be able to reflect the diverse range of human values across various cultures. This paper presents the Cultural Norm-based Cultural Alignment (CNCA) framework, which enables models to leverage their powerful reasoning ability to align with cultural norms. Specifically, we propose three methods to automatically mine cultural norms from limited survey data and explore ways to effectively utilize these norms for improving cultural alignment. Two alignment paradigms are examined: an in-context alignment method, where cultural norms are explicitly integrated into the user context, and a fine-tuning-based method, which internalizes norms through enhanced Chain-of-Thought training data. Comprehensive experiments demonstrate the effectiveness of these methods, highlighting that models with stronger reasoning capabilities benefit more from cultural norm mining and utilization. Our findings emphasize the potential for reasoning models to better reflect diverse human values through culturally informed alignment strategies.
翻译:大型推理模型凭借其先进的推理能力,能够通过深思熟虑的思维过程深入理解并应用安全策略,从而提升模型的安全性。除安全性外,这些模型还需能够反映不同文化背景下人类价值观的多样性。本文提出基于文化规范的文化对齐框架,使模型能够利用其强大的推理能力与文化规范对齐。具体而言,我们提出三种从有限调查数据中自动挖掘文化规范的方法,并探索如何有效利用这些规范以提升文化对齐效果。研究考察了两种对齐范式:上下文对齐方法,即将文化规范显式整合至用户语境中;以及基于微调的方法,通过增强的思维链训练数据将规范内化。综合实验证明了这些方法的有效性,表明推理能力更强的模型从文化规范挖掘与利用中获益更大。我们的研究结果凸显了推理模型通过文化感知的对齐策略更好地反映多元人类价值观的潜力。