The task of query rewrite aims to convert an in-context query to its fully-specified version where ellipsis and coreference are completed and referred-back according to the history context. Although much progress has been made, less efforts have been paid to real scenario conversations that involve drawing information from more than one modalities. In this paper, we propose the task of multimodal conversational query rewrite (McQR), which performs query rewrite under the multimodal visual conversation setting. We collect a large-scale dataset named McQueen based on manual annotation, which contains 15k visual conversations and over 80k queries where each one is associated with a fully-specified rewrite version. In addition, for entities appearing in the rewrite, we provide the corresponding image box annotation. We then use the McQueen dataset to benchmark a state-of-the-art method for effectively tackling the McQR task, which is based on a multimodal pre-trained model with pointer generator. Extensive experiments are performed to demonstrate the effectiveness of our model on this task\footnote{The dataset and code of this paper are both available in \url{https://github.com/yfyuan01/MQR}
翻译:查询重写的任务旨在将一个全文查询转换为完全指定的版本,其中的省略和索引根据历史背景完成并转回。虽然已经取得了很大进展,但对于从不止一种模式中提取信息的真实情景对话却付出了较少的努力。在本文中,我们提议采用多式对话重写(McQR)任务,根据多式视觉对话设置来进行查询重写。我们收集了一个大型数据集,名为McQueen,以人工注释为基础,其中包括15k视觉谈话和80k查询,其中每个查询都与完全指定的重写版本相关。此外,对于在重写中出现的实体,我们提供了相应的图像框注释。我们然后使用McQueen数据集来为有效处理McQR任务设定一个最先进的方法基准,该方法以多式预培训模型为基础,并配有指针生成器。我们进行了广泛的实验,以展示我们这一任务\foot{The datase and code of the explain the production in the prof/Murgimax/Rgin}