Recent years have witnessed an increasing amount of dialogue/conversation on the web especially on social media. That inspires the development of dialogue-based retrieval, in which retrieving videos based on dialogue is of increasing interest for recommendation systems. Different from other video retrieval tasks, dialogue-to-video retrieval uses structured queries in the form of user-generated dialogue as the search descriptor. We present a novel dialogue-to-video retrieval system, incorporating structured conversational information. Experiments conducted on the AVSD dataset show that our proposed approach using plain-text queries improves over the previous counterpart model by 15.8% on R@1. Furthermore, our approach using dialogue as a query, improves retrieval performance by 4.2%, 6.2%, 8.6% on R@1, R@5 and R@10 and outperforms the state-of-the-art model by 0.7%, 3.6% and 6.0% on R@1, R@5 and R@10 respectively.
翻译:近年来,网络特别是社交媒体上的对话/会话数量不断增加。这启发了对话检索的发展,其中,基于对话的视频检索对于推荐系统越来越具有吸引力。不同于其他视频检索任务,对话到视频检索使用以用户生成的对话形式的结构化查询作为搜索描述符。我们提出了一种新颖的对话到视频检索系统,结合了结构化的对话信息。在AVSD数据集上进行的实验表明,我们所提出的使用纯文本查询的方法在R@1上改进了先前的对应模型15.8%。此外,我们使用对话作为查询的方法,在R@1、R@5和R@10上分别改进检索性能4.2%、6.2%和8.6%,并且在R@1、R@5和R@10上分别优于最先进的模型0.7%、3.6%和6.0%。