高精确度学习所需的不相关培训数据的记忆化何时有必要? (When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · 可约的 · 样例 · 训练数据 ·

2020 年 12 月 11 日

When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?

翻译：高精确度学习所需的不相关培训数据的记忆化何时有必要?

Gavin Brown,Mark Bun,Vitaly Feldman,Adam Smith,Kunal Talwar

Modern machine learning models are complex and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. This remains true even when the examples are high-dimensional and have entropy much higher than the sample size, and even when most of that information is ultimately irrelevant to the task at hand. Further, our results do not depend on the training algorithm or the class of models used for learning. Our problems are simple and fairly natural variants of the next-symbol prediction and the cluster labeling tasks. These tasks can be seen as abstractions of image- and text-related prediction problems. To establish our results, we reduce from a family of one-way communication problems for which we prove new information complexity lower bounds.

翻译：现代机器学习模型复杂,而且经常汇集关于个人投入的惊人数量的信息。在极端的情况下,复杂的模型似乎将整个输入实例都记起来,包括似乎无关紧要的信息(例如文本中的社会保障数字)。在本文中,我们的目标是了解这种记忆化对于准确学习是否必要。我们描述了自然预测问题,其中每一个足够准确的培训算法都必须在预测模型中将关于它培训实例中大部分子集的所有信息编码起来。即使这些实例是高维的,其变异体比样本大小高得多,甚至大多数信息最终与手头的任务无关。此外,我们的结果并不取决于培训算法或用于学习的模型类别。我们的问题很简单,相当自然地是下一个符号预测和分组标签任务。这些任务可以被视为与图像和文本有关的预测问题的抽象。为了确定我们的结果,我们从一个单向的通信问题中减少了我们证明信息复杂性较低的范围。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

30+阅读 · 2020年7月12日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习基础原理十日谈】《10 Days Of Grad: Deep Learning From The First Principles》by Bogdan Penkovsky

专知会员服务

26+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

8+阅读 · 2019年1月30日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Overview of the TREC 2020 deep learning track

Arxiv

0+阅读 · 2021年2月15日

Compression phase is not necessary for generalization in representation learning

Arxiv

0+阅读 · 2021年2月15日

Learning Personalized Models of Human Behavior in Chess

Arxiv

0+阅读 · 2021年2月13日

Federated Learning of a Mixture of Global and Local Models

Arxiv

0+阅读 · 2021年2月12日

Sufficiently Accurate Model Learning for Planning

Arxiv

0+阅读 · 2021年2月11日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

One-Shot Relational Learning for Knowledge Graphs

Arxiv

3+阅读 · 2018年8月27日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

相关VIP内容

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

30+阅读 · 2020年7月12日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习基础原理十日谈】《10 Days Of Grad: Deep Learning From The First Principles》by Bogdan Penkovsky

专知会员服务

26+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

8+阅读 · 2019年1月30日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Overview of the TREC 2020 deep learning track

Arxiv

0+阅读 · 2021年2月15日

Compression phase is not necessary for generalization in representation learning

Arxiv

0+阅读 · 2021年2月15日

Learning Personalized Models of Human Behavior in Chess

Arxiv

0+阅读 · 2021年2月13日

Federated Learning of a Mixture of Global and Local Models

Arxiv

0+阅读 · 2021年2月12日

Sufficiently Accurate Model Learning for Planning

Arxiv

0+阅读 · 2021年2月11日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

One-Shot Relational Learning for Knowledge Graphs

Arxiv

3+阅读 · 2018年8月27日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员