分类核心数据集 -- -- 简化和加强 (Coresets for Classification -- Simplified and Strengthened) - 专知论文

会员服务 ·

0

子采样 · 对率损失 · 重要性采样 · 损失 · 损失函数（机器学习） ·

2021 年 6 月 18 日

Coresets for Classification -- Simplified and Strengthened

翻译：分类核心数据集 -- -- 简化和加强

Tung Mai,Anup B. Rao,Cameron Musco

We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced in by Munteanu et al. 2018. Our result is based on subsampling data points with probabilities proportional to their $\ell_1$ $Lewis$ $weights$. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios.

翻译：我们为培训具有广泛损失功能的线性分类员,包括后勤损失和断链损失,提供了相对错误核心。我们的建筑以$\tilde O(d\cddt\mu_y(X)\2/\epsilon ⁇ 2美元)为单位实现$( 1\pm \ epsilon) 相对错误。 $\ mu_y( X)\\\\\ epsilon2美元是数据矩阵 $X\ y( X) 的自然复杂性测量值 $y( y)\ { in\ mathbb{ {r\n\ \ times d} $ 和标签矢量 $y $ \ {1\ 1\ {1\ ⁇ n$, 由 Munteanu etanu et al. 2018 引入。我们的建筑的建筑工程以 $( 2018) 。我们的结果基于与它们的概率成份子抽样数据点成的子取样点为单位, 和 $\ $\ $_ $1 lewis $1 $1 lewxx lexxx 重量成单位。它在实际操作中可以使用一个单一核心部分。它可以在多个学习中不取决于特定损失函数。在多个训练中使用。在多个中使用。

0

相关内容

子采样

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Graph Representation Learning for Road Type Classification

Arxiv

0+阅读 · 2021年8月19日

Mixture-based Feature Space Learning for Few-shot Image Classification

Arxiv

0+阅读 · 2021年8月17日

Risk Bounds for Quantile Trend Filtering

Arxiv

0+阅读 · 2021年8月17日

Investigating a Baseline Of Self Supervised Learning Towards Reducing Labeling Costs For Image Classification

Arxiv

0+阅读 · 2021年8月17日

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Arxiv

6+阅读 · 2021年5月17日

Dissecting Supervised Constrastive Learning

Arxiv

11+阅读 · 2021年2月17日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

重要性采样

损失函数（机器学习）

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Graph Representation Learning for Road Type Classification

Arxiv

0+阅读 · 2021年8月19日

Mixture-based Feature Space Learning for Few-shot Image Classification

Arxiv

0+阅读 · 2021年8月17日

Risk Bounds for Quantile Trend Filtering

Arxiv

0+阅读 · 2021年8月17日

Investigating a Baseline Of Self Supervised Learning Towards Reducing Labeling Costs For Image Classification

Arxiv

0+阅读 · 2021年8月17日

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Arxiv

6+阅读 · 2021年5月17日

Dissecting Supervised Constrastive Learning

Arxiv

11+阅读 · 2021年2月17日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员