混合集成常规化 (On Mixup Regularization) - 专知论文

会员服务 ·

0

Mixup · 正则化项 · 数据转换 · 变换 · 模型评估 ·

2020 年 12 月 8 日

On Mixup Regularization

翻译：混合集成常规化

Luigi Carratino,Moustapha Cissé,Rodolphe Jenatton,Jean-Philippe Vert

Mixup is a data augmentation technique that creates new examples as convex combinationsof training points and labels. This simple technique has empirically shown to improvethe accuracy of many state-of-the-art models in different settings and applications, butthe reasons behind this empirical success remain poorly understood. In this paper wetake a substantial step in explaining the theoretical foundations of Mixup, by clarifyingits regularization effects. We show that Mixup can be interpreted as standard empiricalrisk minimization estimator subject to a combination of data transformation and randomperturbation of the transformed data. We gain two core insights from this new interpretation.First, the data transformation suggests that, at test time, a model trained with Mixup shouldalso be applied to transformed data, a one-line change in code that we show empirically toimprove both accuracy and calibration of the prediction. Second, we show how the randomperturbation of the new interpretation of Mixup induces multiple known regularizationschemes, including label smoothing and reduction of the Lipschitz constant of the estimator.These schemes interact synergistically with each other, resulting in a self calibrated andeffective regularization effect that prevents overfitting and overconfident predictions. Wecorroborate our theoretical analysis with experiments that support our conclusions.

翻译：数据增强技术是一种数据增强技术,它创造了新的实例,作为培训点和标签的组合。这一简单技术从经验上表明,可以提高不同设置和应用程序中许多最先进的模型的准确性,但成功经验背后的原因仍然不甚清楚。在本文中,我们在解释混合理论基础方面迈出了一大步,通过澄清其规范化效果来解释混合的理论基础。我们表明,混合可以被解读为标准的经验风险最小化估计器,条件是数据转换和变换数据随机扰动相结合。我们从这一新的解释中获得了两个核心见解。首先,数据转换表明,在测试时,应当将受过混合培训的模型用于转换数据,但这一成功经验成功的原因仍然不甚为人熟。在本文中,我们从经验上展示了一线变化的代码变化,以增进混合的准确性和校正性效应。第二,我们展示了对混合新解释的随机扰动作用如何引出多种已知的规范化模型,包括将测算师的利普西茨常数贴上和减少的标签。

0

相关内容

Mixup

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Mixup vs. SamplePairing：ICLR2018投稿论文的两种数据增广方式

Mixup vs. SamplePairing：ICLR2018投稿论文的两种数据增广方式

PaperWeekly

3+阅读 · 2018年3月6日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Fitting stochastic predator-prey models using both population density and kill rate data

Arxiv

0+阅读 · 2021年2月10日

Stability of SGD: Tightness Analysis and Improved Bounds

Arxiv

0+阅读 · 2021年2月10日

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

Arxiv

0+阅读 · 2021年2月9日

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

Arxiv

0+阅读 · 2021年2月8日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Numerical approximations of one-point large deviations rate functions of stochastic differential equations with small noise

Arxiv

0+阅读 · 2021年2月8日

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Arxiv

0+阅读 · 2021年2月5日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

A Dual Approach to Scalable Verification of Deep Networks

A Dual Approach to Scalable Verification of Deep Networks

Arxiv

3+阅读 · 2018年8月3日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Mixup vs. SamplePairing：ICLR2018投稿论文的两种数据增广方式

Mixup vs. SamplePairing：ICLR2018投稿论文的两种数据增广方式

PaperWeekly

3+阅读 · 2018年3月6日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Fitting stochastic predator-prey models using both population density and kill rate data

Arxiv

0+阅读 · 2021年2月10日

Stability of SGD: Tightness Analysis and Improved Bounds

Arxiv

0+阅读 · 2021年2月10日

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

Arxiv

0+阅读 · 2021年2月9日

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

Arxiv

0+阅读 · 2021年2月8日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Numerical approximations of one-point large deviations rate functions of stochastic differential equations with small noise

Arxiv

0+阅读 · 2021年2月8日

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Arxiv

0+阅读 · 2021年2月5日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

A Dual Approach to Scalable Verification of Deep Networks

A Dual Approach to Scalable Verification of Deep Networks

Arxiv

3+阅读 · 2018年8月3日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员