This research establishes a better understanding of the syntax choices in speech interactions and of how speech, gesture, and multimodal gesture and speech interactions are produced by users in unconstrained object manipulation environments using augmented reality. The work presents a multimodal elicitation study conducted with 24 participants. The canonical referents for translation, rotation, and scale were used along with some abstract referents (create, destroy, and select). In this study time windows for gesture and speech multimodal interactions are developed using the start and stop times of gestures and speech as well as the stoke times for gestures. While gestures commonly precede speech by 81 ms we find that the stroke of the gesture is commonly within 10 ms of the start of speech. Indicating that the information content of a gesture and its co-occurring speech are well aligned to each other. Lastly, the trends across the most common proposals for each modality are examined. Showing that the disagreement between proposals is often caused by a variation of hand posture or syntax. Allowing us to present aliasing recommendations to increase the percentage of users' natural interactions captured by future multimodal interactive systems.
翻译:这项研究有助于更好地了解语言互动中的语法选择,以及语言、手势和多式联运手势和语言互动如何由用户在不受限制的物体操纵环境中利用增强的现实产生。工作展示了与24名参与者进行的一项多式联运引发研究。翻译、轮换和比例使用的粗体参考文献与一些抽象参考文献(创建、销毁和选择)一起使用。在本研究中,手势和语言多式联运互动的时间窗口是利用手势和语言的开始和停止时间以及手势的口吻时间来开发的。虽然在演讲之前通常有81米的手势,但我们发现手势的中风通常在演讲开始的10米之内。表明手势的信息内容及其共同的演讲相互一致。最后,对每种模式最常见的建议的趋势进行了研究。表明,各种提案之间的分歧往往是手势或语调的变化造成的。允许我们提出建议,以增加未来多式联运互动系统所捕捉到的用户自然互动的百分比。