An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans. However, natural language can be ambiguous or unclear. In cases of uncertainty, humans engage in an interactive process known as repair: asking questions and seeking clarification until their uncertainty is resolved. We propose a framework for building a visually grounded question-asking model capable of producing polar (yes-no) clarification questions to resolve misunderstandings in dialogue. Our model uses an expected information gain objective to derive informative questions from an off-the-shelf image captioner without requiring any supervised question-answer data. We demonstrate our model's ability to pose questions that improve communicative success in a goal-oriented 20 questions game with synthetic and human answerers.
翻译:自然语言处理的首要目标是使机器能够与人类进行无缝的交流。然而,自然语言可能是模糊的或不明确的。在不确定的情况下,人类将参与一个称为修复的互动过程:在不确定性得到解决之前提出问题和寻求澄清。我们建议建立一个框架,以建立一个能够产生极点(是-否)澄清问题的视觉化提问模型,解决对话中的误解。我们的模型使用一种预期的信息获取目标,从现成的图像字幕中获取信息性的问题,而不需要任何受监督的问答数据。我们展示我们的模型有能力在与合成和人解答者一起的面向目标的20个问题游戏中提出提高交流成功性的问题。