学习半监督数据编程的强力综合标签函数 (Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming) - 专知论文

会员服务 ·

0

标注 · 学成 · 泛函 · 评论员 · MoDELS ·

2021 年 9 月 23 日

Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming

翻译：学习半监督数据编程的强力综合标签函数

Ayush Maheshwari,Krishnateja Killamsetty,Ganesh Ramakrishnan,Rishabh Iyer,Marina Danilevsky,Lucian Popa

A critical bottleneck in supervised machine learning is the need for large amounts of labeled data which is expensive and time consuming to obtain. However, it has been shown that a small amount of labeled data, while insufficient to re-train a model, can be effectively used to generate human-interpretable labeling functions (LFs). These LFs, in turn, have been used to generate a large amount of additional noisy labeled data, in a paradigm that is now commonly referred to as data programming. However, previous approaches to automatically generate LFs make no attempt to further use the given labeled data for model training, thus giving up opportunities for improved performance. Moreover, since the LFs are generated from a relatively small labeled dataset, they are prone to being noisy, and naively aggregating these LFs can lead to very poor performance in practice. In this work, we propose an LF based reweighting framework \ouralgo{} to solve these two critical limitations. Our algorithm learns a joint model on the (same) labeled dataset used for LF induction along with any unlabeled data in a semi-supervised manner, and more critically, reweighs each LF according to its goodness, influencing its contribution to the semi-supervised loss using a robust bi-level optimization algorithm. We show that our algorithm significantly outperforms prior approaches on several text classification datasets.

翻译：受监督的机器学习中的一个关键瓶颈是需要大量标签数据,这些数据成本昂贵,耗时费时,才能获得。然而,已经表明,少量标签数据虽然不足以再培训模型,但能够有效地用于产生人类解释标签功能。而这些LF又被用来产生大量额外的杂音标签数据,目前通常称为数据编程的范式就是这种数据。然而,以前自动生成LF的方法并不试图进一步使用给定标签数据进行模型培训,从而放弃改进性能的机会。此外,由于LF数据来自一个相对较小的标签数据集,因此它们往往会变得吵闹,而天真地集聚这些LF值可能会在实践中造成非常差的性能。在这项工作中,我们建议以LF为基础的重标码框架\ouralgo ⁇ 解决这两个关键限制。我们的算法学习了一个联合模型,用于LF上岗的标签数据集与任何未标签的分类数据一起用于改进性能。此外,由于LFLF数据来自一个相对小的标码,因此很容易变得紧张地使用其精锐的压式,我们用一种前的压前的模法来彻底地影响它。

0

相关内容

【ICML2021】弹性图神经网络

专知会员服务

37+阅读 · 2021年7月17日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

春节充电系列：李宏毅2017机器学习课程学习笔记12之半监督学习（Semi-supervised Learning）

春节充电系列：李宏毅2017机器学习课程学习笔记12之半监督学习（Semi-supervised Learning）

专知

6+阅读 · 2018年2月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Fully Linear Graph Convolutional Networks for Semi-Supervised Learning and Clustering

Fully Linear Graph Convolutional Networks for Semi-Supervised Learning and Clustering

Arxiv

0+阅读 · 2021年11月15日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Diffusion Improves Graph Learning

Arxiv

6+阅读 · 2019年11月14日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective

Arxiv

4+阅读 · 2018年10月17日

Recursive Feature Generation for Knowledge-based Learning

Arxiv

4+阅读 · 2018年1月31日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】弹性图神经网络

专知会员服务

37+阅读 · 2021年7月17日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

春节充电系列：李宏毅2017机器学习课程学习笔记12之半监督学习（Semi-supervised Learning）

春节充电系列：李宏毅2017机器学习课程学习笔记12之半监督学习（Semi-supervised Learning）

专知

6+阅读 · 2018年2月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Fully Linear Graph Convolutional Networks for Semi-Supervised Learning and Clustering

Fully Linear Graph Convolutional Networks for Semi-Supervised Learning and Clustering

Arxiv

0+阅读 · 2021年11月15日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Diffusion Improves Graph Learning

Arxiv

6+阅读 · 2019年11月14日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective

Arxiv

4+阅读 · 2018年10月17日

Recursive Feature Generation for Knowledge-based Learning

Arxiv

4+阅读 · 2018年1月31日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

微信扫码咨询专知VIP会员