The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering their usefulness as a training sample. In this work, we propose a generative adversarial model that converts noisy feedback into a plausible natural response in a conversation. The generator's goal is to convert the feedback into a response that answers the user's previous utterance and to fool the discriminator which distinguishes feedback from natural responses. We show that augmenting original training data with these modified feedback responses improves the original chatbot performance from 69.94% to 75.96% in ranking correct responses on the Personachat dataset, a large improvement given that the original model is already trained on 131k samples.
翻译:闲聊者无处不在的性质及其与用户的互动产生大量数据。 我们能否用这些数据改进闲聊者? 一个自食其力的闲聊者在用户对其回应不满意时询问自然语言反馈,并将这种反馈作为额外的培训样本来使用,从而改进了自己。 然而, 在大多数情况下, 用户反馈包含不相干序列, 妨碍了他们作为培训样本的作用。 在这项工作中, 我们提议一种基因化对抗模式, 将吵闹反馈转换成对话中合理的自然反应。 生成者的目标是将反馈转换成一种回应, 回答用户先前的言辞, 并愚弄区分自然反应的偏向者。 我们显示, 利用这些修改后的反馈回应来增加原始培训数据, 提高了个人聊天者在对人行数据库的正确反应排名上的最初表现, 从69.94% 提高到75.96% 。 由于原始模型已经对131k 样本进行了培训, 这是一项很大的改进。