标签平滑什么时候有用? (When Does Label Smoothing Help?) - 专知论文

会员服务 ·

0

平滑 · 泛化理论 · 标注 · Networking · 硬目标 ·

2019 年 6 月 6 日

When Does Label Smoothing Help?

翻译：标签平滑什么时候有用?

Rafael Müller,Simon Kornblith,Geoffrey Hinton

from arxiv, Under review

The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation and speech recognition. Despite its widespread use, label smoothing is still poorly understood. Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. However, we also observe that if a teacher network is trained with label smoothing, knowledge distillation into a student network is much less effective. To explain these observations, we visualize how label smoothing changes the representations learned by the penultimate layer of the network. We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters. This results in loss of information in the logits about resemblances between instances of different classes, which is necessary for distillation, but does not hurt generalization or calibration of the model's predictions.

翻译：多级神经网络的普及和学习速度通常可以通过使用硬目标的加权平均值和标签统一分布的软目标来大大改进。平滑标签使网络不会过于自信和标签平滑在许多最先进的模型中使用, 包括图像分类、语言翻译和语音识别。尽管标签平滑被广泛使用, 但是仍然很难理解。我们在这里的经验显示, 除了改进一般化之外, 标签平滑会改进模型校准, 从而大大改进光束搜索。但是, 我们还观察到, 如果教师网络受过标签平滑的培训, 知识蒸馏到学生网络的效果要低得多。为了解释这些观察, 我们直观地看到, 标签平滑会如何改变网络侧面层所学到的演示。我们显示, 标签平滑会鼓励将同一班级的培训实例展示到紧凑的组中。这导致不同班级之间对相似点的对数的对数丢失, 这对于蒸馏是必要的, 但不会妨碍模型预测的概括化或校准。

1

相关内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Thomas G. Dietterich】机器“理解”意味着什么?（What does it mean for a machine to “understand”?）

专知会员服务

9+阅读 · 2020年1月3日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

102+阅读 · 2019年12月9日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

130+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

已删除

生物探索

3+阅读 · 2018年2月10日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Unifying Graph Convolutional Neural Networks and Label Propagation

Arxiv

31+阅读 · 2020年2月17日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

7+阅读 · 2019年2月8日

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Arxiv

7+阅读 · 2018年5月26日

Token-level and sequence-level loss smoothing for RNN language models

Arxiv

7+阅读 · 2018年5月14日

Scene Graph Parsing as Dependency Parsing

Arxiv

6+阅读 · 2018年3月25日

Analyzing Uncertainty in Neural Machine Translation

Arxiv

6+阅读 · 2018年2月28日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

What Does a TextCNN Learn?

Arxiv

8+阅读 · 2018年1月19日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Thomas G. Dietterich】机器“理解”意味着什么?（What does it mean for a machine to “understand”?）

专知会员服务

9+阅读 · 2020年1月3日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

102+阅读 · 2019年12月9日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

130+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

已删除

生物探索

3+阅读 · 2018年2月10日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Unifying Graph Convolutional Neural Networks and Label Propagation

Arxiv

31+阅读 · 2020年2月17日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

7+阅读 · 2019年2月8日

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Arxiv

7+阅读 · 2018年5月26日

Token-level and sequence-level loss smoothing for RNN language models

Arxiv

7+阅读 · 2018年5月14日

Scene Graph Parsing as Dependency Parsing

Arxiv

6+阅读 · 2018年3月25日

Analyzing Uncertainty in Neural Machine Translation

Arxiv

6+阅读 · 2018年2月28日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

What Does a TextCNN Learn?

Arxiv

8+阅读 · 2018年1月19日

微信扫码咨询专知VIP会员