通过在语言模式嵌入模型中识别线性有毒子空间,简单文本解毒 (Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings) - 专知论文

会员服务 ·

0

子空间 · 语言模型化 · 可辨认的 · MoDELS · SimPLe ·

2021 年 12 月 15 日

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

翻译：通过在语言模式嵌入模型中识别线性有毒子空间,简单文本解毒

Andrew Wang,Mohit Sudhakar,Yangfeng Ji

Large pre-trained language models are often trained on large volumes of internet data, some of which may contain toxic or abusive language. Consequently, language models encode toxic information, which makes the real-world usage of these language models limited. Current methods aim to prevent toxic features from appearing generated text. We hypothesize the existence of a low-dimensional toxic subspace in the latent space of pre-trained language models, the existence of which suggests that toxic features follow some underlying pattern and are thus removable. To construct this toxic subspace, we propose a method to generalize toxic directions in the latent space. We also provide a methodology for constructing parallel datasets using a context based word masking system. Through our experiments, we show that when the toxic subspace is removed from a set of sentence representations, almost no toxic representations remain in the result. We demonstrate empirically that the subspace found using our method generalizes to multiple toxicity corpora, indicating the existence of a low-dimensional toxic subspace.

翻译：受过训练的大型语言模型往往在大量互联网数据方面接受培训,其中一些数据可能含有有毒或滥用语言。因此,语言模型编码有毒信息,从而限制这些语言模型的实际使用。目前的方法旨在防止产生毒性特征。我们假设在经过训练的语言模型的潜伏空间中存在着低维有毒亚空间,这表明有毒特征遵循了某种基本模式,因此是可以拆除的。为了构建这一有毒子空间,我们建议了一种方法,在潜藏空间中概括毒性方向。我们还提供了一种方法,用一种基于上下文的字面遮罩系统构建平行数据集。我们通过实验表明,当有毒亚空间从一组句子表达中去除时,结果中几乎没有任何毒性表现。我们从经验上表明,我们用方法发现的子空间一般地概括了多种毒性子空间,表明存在一种低维毒性子空间。

0

相关内容

子空间

基于预训练语言模型的文本生成

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

【EMNLP2021】span-based反情感偏见的方面词表示学习框架

专知会员服务

16+阅读 · 2021年9月25日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

慕尼黑大学LMU博士论文：自然语言文本神经网络信息提取，240页pdf

慕尼黑大学LMU博士论文：自然语言文本神经网络信息提取，240页pdf

专知会员服务

74+阅读 · 2020年1月13日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

18+阅读 · 2019年2月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Arxiv

0+阅读 · 2022年2月18日

Sparse Markov Models for High-dimensional Inference

Arxiv

0+阅读 · 2022年2月16日

Tracing Text Provenance via Context-Aware Lexical Substitution

Arxiv

5+阅读 · 2021年12月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Learning Conceptual-Contextual Embeddings for Medical Text

Arxiv

14+阅读 · 2020年3月12日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Multiple Combined Constraints for Image Stitching

Multiple Combined Constraints for Image Stitching

Arxiv

3+阅读 · 2018年9月18日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

基于预训练语言模型的文本生成

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

【EMNLP2021】span-based反情感偏见的方面词表示学习框架

专知会员服务

16+阅读 · 2021年9月25日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

慕尼黑大学LMU博士论文：自然语言文本神经网络信息提取，240页pdf

慕尼黑大学LMU博士论文：自然语言文本神经网络信息提取，240页pdf

专知会员服务

74+阅读 · 2020年1月13日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

18+阅读 · 2019年2月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Arxiv

0+阅读 · 2022年2月18日

Sparse Markov Models for High-dimensional Inference

Arxiv

0+阅读 · 2022年2月16日

Tracing Text Provenance via Context-Aware Lexical Substitution

Arxiv

5+阅读 · 2021年12月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Learning Conceptual-Contextual Embeddings for Medical Text

Arxiv

14+阅读 · 2020年3月12日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Multiple Combined Constraints for Image Stitching

Multiple Combined Constraints for Image Stitching

Arxiv

3+阅读 · 2018年9月18日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员