Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.
翻译:在人际对话中,维持相互理解是避免对话中断的关键要素,其中修复机制——特别是他人发起修复(OIR,即一方发出问题信号并促使对方解决)——发挥着至关重要的作用。然而,现有对话系统仍难以识别用户发起的修复请求,导致对话中断或参与度下降。本研究提出一种多模态模型,通过整合基于会话分析的语音学与韵律特征,自动检测荷兰语对话中的修复发起行为。结果表明,韵律线索能补充语音学特征,并显著提升预训练文本与音频嵌入模型的效果,揭示了不同特征间的交互机制。未来研究方向包括引入视觉线索,探索多语言及跨情境语料库,以评估模型的鲁棒性与泛化能力。