Despite important progress, conversational systems often generate dialogues that sound unnatural to humans. We conjecture that the reason lies in their different training and testing conditions: agents are trained in a controlled "lab" setting but tested in the "wild". During training, they learn to generate an utterance given the human dialogue history. On the other hand, during testing, they must interact with each other, and hence deal with noisy data. We propose to fill this gap by training the model with mixed batches containing both samples of human and machine-generated dialogues. We assess the validity of the proposed method on GuessWhat?!, a visual referential game.
翻译:尽管取得了重要进展,但对话系统往往产生对人类来说听起来不自然的对话。我们推测,原因在于他们不同的培训和测试条件:代理人在受控的“实验室”环境中接受培训,但在“圆形”环境中接受测试。在培训期间,他们学会根据人类对话历史生成一个发声。另一方面,在测试期间,他们必须相互交流,从而处理吵闹的数据。我们提议用包含人类和机器生成的对话样本的混合组别来训练模型,以填补这一空白。我们评估了关于“猜数”的拟议方法的有效性,这是一个视觉优雅的游戏!!!!!!!!!!!