能否通过下游数据推动基于原始文本的自我监督学习? 理论分析 (Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? A Theoretical Analysis) - 专知论文

会员服务 ·

0

Boosting（一种模型训练加速方式） · 样本复杂度 · 学成 · 未标记 · 条件独立的 ·

2021 年 10 月 25 日

Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? A Theoretical Analysis

翻译：能否通过下游数据推动基于原始文本的自我监督学习? 理论分析

Jiaye Teng,Weiran Huang,Haowei He

Pretext-based self-supervised learning learns the semantic representation via a handcrafted pretext task over unlabeled data and then uses the learned representation for downstream tasks, which effectively reduces the sample complexity of downstream tasks under Conditional Independence (CI) condition. However, the downstream sample complexity gets much worse if the CI condition does not hold. One interesting question is whether we can make the CI condition hold by using downstream data to refine the unlabeled data to boost self-supervised learning. At first glance, one might think that seeing downstream data in advance would always boost the downstream performance. However, we show that it is not intuitively true and point out that in some cases, it hurts the final performance instead. In particular, we prove both model-free and model-dependent lower bounds of the number of downstream samples used for data refinement. Moreover, we conduct several experiments on both synthetic and real-world datasets to verify our theoretical results.

翻译：以文字为基础的自我监督的学习通过手工制作的借口任务,对未贴标签的数据学习语义表达方式,然后对下游任务使用所学的代言方式,这有效地降低了在有条件独立条件下下游任务的抽样复杂性。然而,如果光学独立条件不起作用,下游样本的复杂性就会大为恶化。一个令人感兴趣的问题是,我们是否能够通过使用下游数据改进未贴标签的数据来维持CI的条件,以促进自我监督的学习。乍一看,人们可能会认为,提前看到下游数据将总是会提高下游的性能。然而,我们表明,这并非直觉真实,并指出,在某些情况下,它会损害最后的性能。特别是,我们证明用于改进数据的下游样本数量既无模式,又依赖模式的较低界限。此外,我们还在合成和现实世界数据集进行若干次实验,以核实我们的理论结果。

0

相关内容

Boosting（一种模型训练加速方式）

Boosting（一种模型训练加速方式）

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Improved skin lesion recognition by a Self-Supervised Curricular Deep Learning approach

Improved skin lesion recognition by a Self-Supervised Curricular Deep Learning approach

Arxiv

0+阅读 · 2021年12月22日

Barely-Supervised Learning: Semi-Supervised Learning with very few labeled images

Arxiv

0+阅读 · 2021年12月22日

Supervised Graph Contrastive Pretraining for Text Classification

Arxiv

1+阅读 · 2021年12月21日

Contrastive String Representation Learning using Synthetic Data

Arxiv

0+阅读 · 2021年12月21日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

Early-Learning Regularization Prevents Memorization of Noisy Labels

Early-Learning Regularization Prevents Memorization of Noisy Labels

Arxiv

3+阅读 · 2020年6月30日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

样本复杂度

条件独立的

相关VIP内容

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Improved skin lesion recognition by a Self-Supervised Curricular Deep Learning approach

Improved skin lesion recognition by a Self-Supervised Curricular Deep Learning approach

Arxiv

0+阅读 · 2021年12月22日

Barely-Supervised Learning: Semi-Supervised Learning with very few labeled images

Arxiv

0+阅读 · 2021年12月22日

Supervised Graph Contrastive Pretraining for Text Classification

Arxiv

1+阅读 · 2021年12月21日

Contrastive String Representation Learning using Synthetic Data

Arxiv

0+阅读 · 2021年12月21日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

Early-Learning Regularization Prevents Memorization of Noisy Labels

Early-Learning Regularization Prevents Memorization of Noisy Labels

Arxiv

3+阅读 · 2020年6月30日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

微信扫码咨询专知VIP会员