Large Language Models are increasingly used in conversational systems such as digital personal assistants, shaping how people interact with technology through language. While their responses often sound fluent and natural, they can also carry subtle tone biases such as sounding overly polite, cheerful, or cautious even when neutrality is expected. These tendencies can influence how users perceive trust, empathy, and fairness in dialogue. In this study, we explore tone bias as a hidden behavioral trait of large language models. The novelty of this research lies in the integration of controllable large language model based dialogue synthesis with tone classification models, enabling robust and ethical emotion recognition in personal assistant interactions. We created two synthetic dialogue datasets, one generated from neutral prompts and another explicitly guided to produce positive or negative tones. Surprisingly, even the neutral set showed consistent tonal skew, suggesting that bias may stem from the model's underlying conversational style. Using weak supervision through a pretrained DistilBERT model, we labeled tones and trained several classifiers to detect these patterns. Ensemble models achieved macro F1 scores up to 0.92, showing that tone bias is systematic, measurable, and relevant to designing fair and trustworthy conversational AI.
翻译:大型语言模型正日益应用于数字个人助理等对话系统中,通过语言塑造着人与技术的交互方式。尽管其回应通常听起来流畅自然,但也可能携带微妙的语调偏见,例如在预期保持中立时仍显得过度礼貌、欢快或谨慎。这些倾向会影响用户对对话中信任感、共情度和公平性的感知。本研究将语调偏见视为大型语言模型的一种隐性行为特征进行探索。本研究的创新之处在于将可控的大型语言模型对话合成与语调分类模型相结合,从而在个人助理交互中实现稳健且符合伦理的情感识别。我们创建了两个合成对话数据集:一个由中性提示生成,另一个则被明确引导产生积极或消极语调。令人惊讶的是,即使中性数据集也表现出持续的语调偏斜,这表明偏见可能源于模型底层的对话风格。通过使用预训练的DistilBERT模型进行弱监督,我们对语调进行标注并训练了多个分类器来检测这些模式。集成模型的宏观F1分数最高达到0.92,表明语调偏见具有系统性、可测量性,且与设计公平可信的对话式人工智能密切相关。