Our goal is to explore how the abilities brought in by a dialogue manager can be included in end-to-end visually grounded conversational agents. We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess. Our analyses show that adding a decision making component produces dialogues that are less repetitive and that include fewer unnecessary questions, thus potentially leading to more efficient and less unnatural interactions.
翻译:我们的目标是探讨如何将对话管理者带来的能力纳入端到端的有视觉基础的对话媒介中。我们为实现这一总目标采取初步步骤,方法是加强面向任务的视觉对话模式,其中含有决策部分,决定是提出后续问题以确定图像中的目标指针,还是停止对话以猜测。我们的分析表明,增加决策组成部分会产生较少重复性的对话,包括较少不必要的问题,从而可能导致更有效和更不自然的互动。