Existing simulations designed for cultural and interpersonal skill training rely on pre-defined responses with a menu option selection interface. Using a multiple-choice interface and restricting trainees' responses may limit the trainees' ability to apply the lessons in real life situations. This systems also uses a simplistic evaluation model, where trainees' selected options are marked as either correct or incorrect. This model may not capture sufficient information that could drive an adaptive feedback mechanism to improve trainees' cultural awareness. This paper describes the design of a dialogue-based simulation for cultural awareness training. The simulation, built around a disaster management scenario involving a joint coalition between the US and the Chinese armies. Trainees were able to engage in realistic dialogue with the Chinese agent. Their responses, at different points, get evaluated by different multi-label classification models. Based on training on our dataset, the models score the trainees' responses for cultural awareness in the Chinese culture. Trainees also get feedback that informs the cultural appropriateness of their responses. The result of this work showed the following; i) A feature-based evaluation model improves the design, modeling and computation of dialogue-based training simulation systems; ii) Output from current automatic speech recognition (ASR) systems gave comparable end results compared with the output from manual transcription; iii) A multi-label classification model trained as a cultural expert gave results which were comparable with scores assigned by human annotators.
翻译:为文化和人际技能培训设计的现有模拟取决于预先确定的反应,并有一个菜单选择界面。使用多种选择界面和限制受训人员的反应可能会限制受训人员在现实生活中应用经验教训的能力。这种系统还使用简单化的评价模式,即受训人员选定的选择方案被标记为正确或不正确。这种模式可能无法捕捉足够的信息,从而推动一个适应性反馈机制,以提高受训人员的文化意识。本文描述了文化意识培训对话模拟的设计。模拟是围绕一个灾害管理设想进行的,其中涉及美国和中国军队的联合联盟。受训人员能够与中国代理人进行现实的对话。他们在不同点得到不同多标签分类模式的评价。根据我们的数据集培训,模型对受训人员在中国文化意识方面的响应进行评分。受训人员还获得反馈,以了解其反应的文化适当性。这项工作的结果显示如下:基于特征的评价模式改进了对话培训模拟系统的设计、建模和计算。 受训人员能够与中国代理人进行现实对话对话对话对话对话的模拟系统进行实际对话对话对话对话对话。他们在不同点上的反应都得到不同的多标签分类模型的评估。根据我们所分配到的模型,通过可比较的模型,从目前的自动语音识别结果,用一个可比较的数学分数(A/SR)由经过比较的人类分类得出的结果。