In this study, conversations between humans and avatars are linguistically, organizationally, and structurally analyzed, focusing on what is necessary for creating face-to-face multimodal interfaces for machines. We videorecorded thirty-four human-avatar interactions, performed complete linguistic microanalysis on video excerpts, and marked all the occurrences of multimodal actions and events. Statistical inferences were applied to data, allowing us to comprehend not only how often multimodal actions occur but also how multimodal events are distributed between the speaker (emitter) and the listener (recipient). We also observed the distribution of multimodal occurrences for each modality. The data show evidence that double-loop feedback is established during a face-to-face conversation. This led us to propose that knowledge from Conversation Analysis (CA), cognitive science, and Theory of Mind (ToM), among others, should be incorporated into the ones used for describing human-machine multimodal interactions. Face-to-face interfaces require an additional control layer to the multimodal fusion layer. This layer has to organize the flow of conversation, integrate the social context into the interaction, as well as make plans concerning 'what' and 'how' to progress on the interaction. This higher level is best understood if we incorporate insights from CA and ToM into the interface system.
翻译:在这项研究中,人与外国人之间的对话在语言、组织上和结构上都进行了分析,重点是为机器创建面对面的多式联运接口所必要的内容。我们录制了34次人与渡轮的互动,对视频节录进行了完整的语言微观分析,对视频节录进行了完整的语言微观分析,并标记了所有多式行动和事件的发生。统计推论应用于数据,不仅使我们能够理解多式行动的发生频率,而且能够理解多式行动是如何在演讲者(发射者)和听众(接收者)之间分配多式活动的。我们还观察了每个模式多式联运事件的分布情况。数据显示,在面对面的交谈中建立了双环反馈。这导致我们提议,在对话分析、认知科学和思想理论(TM)等的知识中,应当纳入用于描述人与机器的多式联运相互作用的知识中。面对面的界面需要从多式融合层到多式融合层的额外控制层。这一层必须组织对话的流,将社会背景融入互动中,以及从我们了解的“什么”和“如何理解”的界面到“最佳互动”的系统计划。