We present a new corpus for the Situated and Interactive Multimodal Conversations, SIMMC 2.0, aimed at building a successful multimodal assistant agent. Specifically, the dataset features 11K task-oriented dialogs (117K utterances) between a user and a virtual assistant on the shopping domain (fashion and furniture), grounded in situated and photo-realistic VR scenes. The dialogs are collected using a two-phase pipeline, which first generates simulated dialog flows via a novel multimodal dialog simulator we propose, followed by manual paraphrasing of the generated utterances. In this paper, we provide an in-depth analysis of the collected dataset, and describe in detail the four main benchmark tasks we propose for SIMMC 2.0. The preliminary analysis with a baseline model highlights the new challenges that the SIMMC 2.0 dataset brings, suggesting new directions for future research. Our dataset and code will be made publicly available.
翻译:我们为现场和互动多式对话提供了一个新的平台,即SIMMC 2.0, 目的是建立一个成功的多式联运助理代理。 具体地说, 数据集包含一个用户和一个虚拟助理在购物领域(时装和家具)进行11K任务式对话(117K语句)的特征,这些对话以位置和摄影现实VR场景为基础。 对话是用两阶段管道收集的,它首先通过我们提议的新型多式联运对话模拟器生成模拟对话流,然后人工对生成的语句进行翻译。 在本文中,我们对所收集的数据集进行了深入分析,并详细介绍了我们为SIMMC 2. 0 提议的四项主要基准任务。 与基线模型的初步分析突出了SIMMC 2. 0 数据集带来的新挑战,为未来的研究提出了新的方向。 我们的数据集和代码将被公诸于众。