这个词比标签更可怕:利用数据编程,学习没有点点画的标签。 (The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming) - 专知论文

会员服务 ·

0

标注 · 学成 · MoDELS · 概率图模型 · 去噪 ·

2021 年 8 月 26 日

The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

翻译：这个词比标签更可怕:利用数据编程,学习没有点点画的标签。

Chufan Gao,Mononito Goswami

from arxiv, Both authors have equal contribution

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored the use of diverse sources of weak supervision to produce competitive end model classifiers. In this paper, we survey recent work on weak supervision, and in particular, we investigate the Data Programming (DP) framework. Taking a set of potentially noisy heuristics as input, DP assigns denoised probabilistic labels to each data point in a dataset using a probabilistic graphical model of heuristics. We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in data-sparse settings.

翻译：最先进的监督机器学习(ML)模型依靠大量的按点贴标签的培训实例。大量手工标签的数据可能是枯燥、昂贵和易出错的。最近,一些研究探索了使用各种薄弱的监督来源来产生具有竞争力的最终模式分类者。我们在本文件中调查了最近关于薄弱监督的工作,特别是调查了数据规划框架。用一套可能吵闹的杂乱的杂乱理论作为输入,DP使用一种概率性图象模型对数据集中的每个数据点指定了非无名的概率标签。我们分析了DP的数学基础,并通过将其应用于两种真实世界文本分类任务来展示其力量。此外,我们把DP与传统上在数据采集环境中使用的点性活性和半超强性学习技术进行比较。

0

相关内容

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

专知会员服务

164+阅读 · 2020年4月19日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

AINLP

7+阅读 · 2020年2月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

已删除

将门创投

9+阅读 · 2017年10月17日

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Arxiv

3+阅读 · 2021年7月1日

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Multi-class Classification without Multi-class Labels

Multi-class Classification without Multi-class Labels

Arxiv

4+阅读 · 2019年1月2日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

Learning From Positive and Unlabeled Data: A Survey

Learning From Positive and Unlabeled Data: A Survey

Arxiv

5+阅读 · 2018年11月12日

Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective

Arxiv

4+阅读 · 2018年10月17日

Mining on Manifolds: Metric Learning without Labels

Arxiv

6+阅读 · 2018年3月29日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

VIP会员

文章信息

相关主题

概率图模型

相关VIP内容

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

专知会员服务

164+阅读 · 2020年4月19日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

AINLP

7+阅读 · 2020年2月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

已删除

将门创投

9+阅读 · 2017年10月17日

相关论文

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Arxiv

3+阅读 · 2021年7月1日

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Multi-class Classification without Multi-class Labels

Multi-class Classification without Multi-class Labels

Arxiv

4+阅读 · 2019年1月2日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

Learning From Positive and Unlabeled Data: A Survey

Learning From Positive and Unlabeled Data: A Survey

Arxiv

5+阅读 · 2018年11月12日

Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective

Arxiv

4+阅读 · 2018年10月17日

Mining on Manifolds: Metric Learning without Labels

Arxiv

6+阅读 · 2018年3月29日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

微信扫码咨询专知VIP会员