Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data -- examples of what the model should not do. In this work, we propose a novel procedure to train with such data called the CRINGE loss (ContRastive Iterative Negative GEneration). We show the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue. Our models outperform multiple strong baselines and are conceptually simple, easy to train and implement.
翻译:越来越多的证据表明,即使拥有大量的积极培训数据,问题依然存在,但可以通过相对较少的负面数据来缓解 -- -- 模型不应该做的一些例子。在这项工作中,我们建议采用新的程序,用这类数据来培训,称为CRWE损失(Contractive Explective-human互动效应),我们在安全生成、避免矛盾和公开对话等任务的三个不同实验中显示了这一方法的有效性。我们的模型比多个强大的基线要好,在概念上简单,易于培训和实施。