Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use toxic corpus as an additional resource to reduce the toxicity. Our result shows that toxic corpus can indeed help to reduce the toxicity of the language generation process substantially, complementing the existing debiasing methods.
翻译:现有研究调查了自动递减语言模式产生不理想偏见和毒性环境的趋势,提出了各种贬低方法,主要分为基于数据和解码方法。我们在研究中调查了这两种贬低模式的共性,建议使用有毒物质作为减少毒性的额外资源。我们的结果显示,有毒物质确实能够帮助大幅度降低语言生成过程的毒性,补充现有的贬低方法。