Decoding strategies play a crucial role in natural language generation systems. They are usually designed and evaluated in open-ended text-only tasks, and it is not clear how different strategies handle the numerous challenges that goal-oriented multimodal systems face (such as grounding and informativeness). To answer this question, we compare a wide variety of different decoding strategies and hyper-parameter configurations in a Visual Dialogue referential game. Although none of them successfully balance lexical richness, accuracy in the task, and visual grounding, our in-depth analysis allows us to highlight the strengths and weaknesses of each decoding strategy. We believe our findings and suggestions may serve as a starting point for designing more effective decoding algorithms that handle the challenges of Visual Dialogue tasks.
翻译:解码战略在自然语言生成系统中发挥着关键作用,通常是在不限人数的文本基础上设计和评价,而且不清楚不同的战略如何应对面向目标的多式联运系统所面临的众多挑战(如基础和知识)。为了回答这个问题,我们在视觉对话的优惠游戏中比较了各种各样的解码战略和超参数配置。虽然这些战略都无法成功地平衡词汇的丰富性、任务的准确性以及视觉基础,但我们的深入分析使我们得以突出每个解码战略的长处和短处。 我们相信,我们的调查结果和建议可以作为设计更有效的解码算法的起点,以应对视觉对话任务的挑战。