用于数据中心半监测半监测学习的无人监督的数据选择 (Unsupervised Data Selection for Data-Centric Semi-Supervised Learning) - 专知论文

会员服务 ·

0

数据选择 · SSL · 学成 · INFORMS · Performer ·

2021 年 12 月 23 日

Unsupervised Data Selection for Data-Centric Semi-Supervised Learning

翻译：用于数据中心半监测半监测学习的无人监督的数据选择

Xudong Wang,Long Lian,Stella X. Yu

from arxiv, 17 pages

We study unsupervised data selection for semi-supervised learning (SSL), where a large-scale unlabeled dataset is available and a small subset of data is budgeted for label acquisition. Existing SSL methods focus on learning a model that effectively integrates information from given small labeled data and large unlabeled data, whereas we focus on selecting the right data to annotate for SSL without requiring any label or task information. Intuitively, instances to be labeled shall collectively have maximum diversity and coverage for downstream tasks, and individually have maximum information propagation utility for SSL. We formalize these concepts in a three-step data-centric SSL method that improves FixMatch in stability and accuracy by 8% on CIFAR-10 (0.08% labeled) and 14% on ImageNet-1K (0.2% labeled). It is also a universal framework that works with various SSL methods, delivering consistent performance gains. Our work demonstrates that small computation spent on carefully selecting data for annotation brings big annotation efficiency and model performance gain without changing the learning pipeline. Our completely unsupervised data selection can be easily extended to other weakly supervised learning settings.

翻译：我们研究的是用于半监督学习的不受监督的数据选择(SSL),那里有大型的无标签数据集,有少量数据用于获取标签。现有的SSL方法侧重于学习一种能够有效地将特定小标签数据和大无标签数据的信息整合起来的模型,而我们则侧重于在不需要任何标签或任务信息的情况下为SSL选择正确的数据进行批注。自觉地,贴标签的事例应该为下游任务提供最大的多样性和覆盖范围,并个别地为SSL提供最大的信息传播工具。我们将这些概念正式化为三步以数据为中心的SSL方法,在CIFAR-10(0.08%贴上标签)和图像Net-1K(0.2%贴上标签)上使固定匹配的稳定性和准确性提高8%。它也是一个通用框架,与各种SSL方法合作,实现一致的绩效增益。我们的工作表明,在仔细选择说明数据时花费的小计算可以带来较大的说明效率和模型性能,而不改变学习管道。我们完全未受监督的数据选择可以很容易扩展到其他薄弱的学习环境。

0

相关内容

数据选择

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

适应于大数据特性的智能存储技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

用晶格Boltzmann方法研究青光眼流体力学

国家自然科学基金

0+阅读 · 2014年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

众包环境下的成对约束传递问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境中租户数据的计算安全保障机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

结构物入水冲击动力学问题的DSPH计算方法和试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于肺结节多正交位CT图像Curvelet纹理构建 Gradient Boosting 集成预测模型

国家自然科学基金

0+阅读 · 2011年12月31日

重载齿轮箱复杂工况多源激励下复合故障耦合机理及诊断方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Active Few-Shot Learning with FASL

Arxiv

0+阅读 · 2022年4月20日

Utilizing unsupervised learning to improve sward content prediction and herbage mass estimation

Arxiv

0+阅读 · 2022年4月20日

Deep Federated Learning for Autonomous Driving

Arxiv

0+阅读 · 2022年4月19日

Supervised Contrastive Learning for Recommendation

Arxiv

0+阅读 · 2022年4月19日

Joint Multi-view Unsupervised Feature Selection and Graph Learning

Arxiv

0+阅读 · 2022年4月18日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

通信行业：智能低空通感网络白皮书

3D形状生成：综述

6000字《伊朗-以色列战争解析：欺骗与信息战如何塑造公众认知》最新报告（附原文）

【博士论文】优化智能体工作流以提升信息获取效率

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Active Few-Shot Learning with FASL

Arxiv

0+阅读 · 2022年4月20日

Utilizing unsupervised learning to improve sward content prediction and herbage mass estimation

Arxiv

0+阅读 · 2022年4月20日

Deep Federated Learning for Autonomous Driving

Arxiv

0+阅读 · 2022年4月19日

Supervised Contrastive Learning for Recommendation

Arxiv

0+阅读 · 2022年4月19日

Joint Multi-view Unsupervised Feature Selection and Graph Learning

Arxiv

0+阅读 · 2022年4月18日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

适应于大数据特性的智能存储技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

用晶格Boltzmann方法研究青光眼流体力学

国家自然科学基金

0+阅读 · 2014年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

众包环境下的成对约束传递问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境中租户数据的计算安全保障机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

结构物入水冲击动力学问题的DSPH计算方法和试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于肺结节多正交位CT图像Curvelet纹理构建 Gradient Boosting 集成预测模型

国家自然科学基金

0+阅读 · 2011年12月31日

重载齿轮箱复杂工况多源激励下复合故障耦合机理及诊断方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员