平衡数据子集选择中的限制和次模式 (Balancing Constraints and Submodularity in Data Subset Selection) - 专知论文

会员服务 ·

0

贪心逐层预训练 · 约束 · 多样性 · MoDELS · 类标记 ·

2021 年 4 月 26 日

Balancing Constraints and Submodularity in Data Subset Selection

翻译：平衡数据子集选择中的限制和次模式

Srikumar Ramalingam,Daniel Glasner,Kaushal Patel,Raviteja Vemulapalli,Sadeep Jayasumana,Sanjiv Kumar

Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost. Most deep learning models require enormous resources during training, both in terms of computation and in human labeling effort. In this paper, we show that one can achieve similar accuracy to traditional deep-learning models, while using less training data. Much of the previous work in this area relies on using uncertainty or some form of diversity to select subsets of a larger training set. Submodularity, a discrete analogue of convexity, has been exploited to model diversity in various settings including data subset selection. In contrast to prior methods, we propose a novel diversity driven objective function, and balancing constraints on class labels and decision boundaries using matroids. This allows us to use efficient greedy algorithms with approximation guarantees for subset selection. We outperform baselines on standard image classification datasets such as CIFAR-10, CIFAR-100, and ImageNet. In addition, we also show that the proposed balancing constraints can play a key role in boosting the performance in long-tailed datasets such as CIFAR-100-LT.

翻译：深层次的学习在视觉和自然语言处理方面产生了不同寻常的结果,但这一成绩是有代价的。大多数深层次的学习模式都需要在培训期间在计算和人类标签工作方面提供大量资源。在本文件中,我们表明,在使用较少的培训数据的同时,可以实现与传统的深层次学习模式相似的准确性。该领域以前的许多工作依靠使用不确定性或某种形式的多样性来选择大型培训数据集的子集。亚模块性(一种离散的凝结类比)已经被用来在包括数据子集选择在内的各种环境中模拟多样性。与以前的方法不同,我们建议采用新的多样性驱动目标功能,并平衡对类类标签和决定界限的限制,从而使我们能够使用具有近似保障的贴合算法来选择子。我们超越了标准图像分类数据集的基线,如CIFAR-10、CIFAR-100和图像网络。此外,我们还表明,拟议的平衡制约可以发挥关键作用,在诸如CIFAR-100-LT等长期的数据集中提高性能。

0

相关内容

贪心逐层预训练

贪心逐层预训练

【AAAI2021】知识迁移的机器学习成员隐私保护，57页ppt

【AAAI2021】知识迁移的机器学习成员隐私保护，57页ppt

专知会员服务

28+阅读 · 2021年2月9日

【AAAI2021】信息瓶颈和有监督表征解耦

【AAAI2021】信息瓶颈和有监督表征解耦

专知会员服务

21+阅读 · 2021年1月27日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

专知会员服务

30+阅读 · 2019年11月30日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

已删除

将门创投

5+阅读 · 2019年9月10日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

The Power of Randomization: Efficient and Effective Algorithms for Constrained Submodular Maximization

Arxiv

0+阅读 · 2021年6月15日

Guaranteeing Half-Maximin Shares Under Cardinality Constraints

Arxiv

0+阅读 · 2021年6月14日

Semi-Supervised Data Programming with Subset Selection

Arxiv

1+阅读 · 2021年6月12日

An Integer Linear Programming Framework for Mining Constraints from Data

Arxiv

0+阅读 · 2021年6月11日

A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

Arxiv

0+阅读 · 2021年6月10日

ImageNet-21K Pretraining for the Masses

Arxiv

1+阅读 · 2021年6月6日

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Arxiv

0+阅读 · 2021年5月31日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Arxiv

3+阅读 · 2018年8月2日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

贪心逐层预训练

相关VIP内容

【AAAI2021】知识迁移的机器学习成员隐私保护，57页ppt

【AAAI2021】知识迁移的机器学习成员隐私保护，57页ppt

专知会员服务

28+阅读 · 2021年2月9日

【AAAI2021】信息瓶颈和有监督表征解耦

【AAAI2021】信息瓶颈和有监督表征解耦

专知会员服务

21+阅读 · 2021年1月27日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

【IJCAI 2019 | tutorial】大数据中的小数据挑战Small Data Challenges in Big Data Era ，华为|Guo-Jun Qi，柯达|Jiebo Luo

专知会员服务

30+阅读 · 2019年11月30日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

已删除

将门创投

5+阅读 · 2019年9月10日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

The Power of Randomization: Efficient and Effective Algorithms for Constrained Submodular Maximization

Arxiv

0+阅读 · 2021年6月15日

Guaranteeing Half-Maximin Shares Under Cardinality Constraints

Arxiv

0+阅读 · 2021年6月14日

Semi-Supervised Data Programming with Subset Selection

Arxiv

1+阅读 · 2021年6月12日

An Integer Linear Programming Framework for Mining Constraints from Data

Arxiv

0+阅读 · 2021年6月11日

A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

Arxiv

0+阅读 · 2021年6月10日

ImageNet-21K Pretraining for the Masses

Arxiv

1+阅读 · 2021年6月6日

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Arxiv

0+阅读 · 2021年5月31日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Arxiv

3+阅读 · 2018年8月2日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员