InfoBERT:从信息理论角度改进语言模式的强健性 (InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective) - 专知论文

会员服务 ·

0

INFORMS · 稳健性 · 语言模型化 · 正则化项 · MoDELS ·

2021 年 3 月 22 日

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

翻译：InfoBERT:从信息理论角度改进语言模式的强健性

Boxin Wang,Shuohang Wang,Yu Cheng,Zhe Gan,Ruoxi Jia,Bo Li,Jingjing Liu

from arxiv, Accepted to ICLR 2021. 23 pages, 9 tables, 3 figures

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.

翻译：然而,最近的研究表明,这种基于BERT的模型很容易面对文字对抗性攻击的威胁。我们的目标是从信息理论的角度解决这一问题,并提出InfoBERT,这是对预先培训的语言模型进行严格微调的新学习框架。InfoBERT包含两个基于相互信息的示范培训规范器:(一) 信息瓶式常规化器,它压制输入和特征代表之间的相互信息噪音;(二) 机械化功能常规化器,它增加了地方强势特征和全球特征之间的相互信息。我们提供了一种原则性方法,从理论上分析和改进标准培训和对抗性培训中语言模型代表学习的稳健性。广泛的实验表明,InfoBERT在自然语言推断(NLI)和问题解答(QA)的若干对抗性数据设置上达到了最先进的精确度。我们的代码可在 https://giuthub.AIcom网站上查阅。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【PNAS】深度神经网络中的理论议题，麻省理工Tomaso Poggio撰写

专知会员服务

20+阅读 · 2021年1月23日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

多伦多大学2020春季CSC311课程「机器学习导论」，学习ML基础知识

多伦多大学2020春季CSC311课程「机器学习导论」，学习ML基础知识

专知会员服务

54+阅读 · 2020年1月13日

【新书】深度学习自然语言处理，Deep Learning for Natural Language Processing

【新书】深度学习自然语言处理，Deep Learning for Natural Language Processing

专知会员服务

67+阅读 · 2019年12月27日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Fine-tune BERT for Extractive Summarization

Arxiv

21+阅读 · 2019年3月25日

A BERT Baseline for the Natural Questions

Arxiv

8+阅读 · 2019年3月21日

Improving Information Extraction from Images with Learned Semantic Models

Improving Information Extraction from Images with Learned Semantic Models

Arxiv

6+阅读 · 2018年8月27日

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Arxiv

3+阅读 · 2018年7月19日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【PNAS】深度神经网络中的理论议题，麻省理工Tomaso Poggio撰写

专知会员服务

20+阅读 · 2021年1月23日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

多伦多大学2020春季CSC311课程「机器学习导论」，学习ML基础知识

多伦多大学2020春季CSC311课程「机器学习导论」，学习ML基础知识

专知会员服务

54+阅读 · 2020年1月13日

【新书】深度学习自然语言处理，Deep Learning for Natural Language Processing

【新书】深度学习自然语言处理，Deep Learning for Natural Language Processing

专知会员服务

67+阅读 · 2019年12月27日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Fine-tune BERT for Extractive Summarization

Arxiv

21+阅读 · 2019年3月25日

A BERT Baseline for the Natural Questions

Arxiv

8+阅读 · 2019年3月21日

Improving Information Extraction from Images with Learned Semantic Models

Improving Information Extraction from Images with Learned Semantic Models

Arxiv

6+阅读 · 2018年8月27日

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Arxiv

3+阅读 · 2018年7月19日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员