子抽样算法,防止离线 (A sub-sampling algorithm preventing outliers) - 专知论文

会员服务 ·

0

异常点 · 可交换的 · 子采样 · 无监督 · 估计/估计量 ·

2022 年 8 月 12 日

A sub-sampling algorithm preventing outliers

翻译：子抽样算法,防止离线

L. Deldossi,E. Pesce,C. Tommasi

from arxiv, 17 pages; 1 figure

Nowadays, in many different fields, massive data are available and for several reasons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observations. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influence). To overcome this problem, firstly, we propose an unsupervised exchange procedure that enables us to select a nearly D-optimal subset of observations without high leverage values. Then, we provide a supervised version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the unsupervised and the supervised selection procedures are generalized to I-optimality, with the goal of getting accurate predictions.

翻译：目前,在许多不同的领域,有大量数据,而且出于若干原因,也许可以简单分析数据的一个子集。D-最佳标准的应用有助于最佳地选择观测的子样本。然而,众所周知,D-最佳支持点位于设计空间的边界上,如果它们与极端反应值齐头并进,它们可能对估计的线性模型(影响大的杠杆点)产生严重影响。为了克服这一问题,首先,我们提出一个未经监督的交换程序,使我们能够选择近乎D-最佳的观察子集,而没有高杠杆值。然后,我们提供了这一交换程序的受监督版本,其中除了高杠杆值外,还指明了答复中的外端点(与高杠杆点无关)是避免的。这是可能的,因为与其他设计情况不同的是,在从大数据集的子抽样中可以找到反应值。最后,未经监督和监督的选择程序都普遍化为I-最佳性,目标是获得准确的预测。

0

相关内容

异常点

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

山豆根中具有抗肿瘤活性的Cytisine-Pterocarpan型新骨架化合物的发现及其仿生合成研究

国家自然科学基金

0+阅读 · 2015年12月31日

二亚硝基哌嗪（DNP）介导Clusterin表达参与鼻咽癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有c-Met抑制作用的新型哒嗪酮类化合物的设计、合成及其在抗肿瘤方面的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于可活化胶原蛋白结合肽的肿瘤MMP-14酶活性核素显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

四烷基膦氨基酸盐离子液体的物理化学性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于海量时空数据的城市居民移动模式研究

国家自然科学基金

0+阅读 · 2012年12月31日

太赫兹频域下极化声子研究和非线性光学材料探索

国家自然科学基金

0+阅读 · 2011年12月31日

新型闪烁晶体LuBO3:Ce的相变、生长与闪烁性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

铁电配合物的合成，结构与性质研究

国家自然科学基金

0+阅读 · 2008年12月31日

A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering

Arxiv

0+阅读 · 2022年10月6日

A Fourier Approach to Mixture Learning

Arxiv

0+阅读 · 2022年10月5日

On the Statistical Complexity of Estimation and Testing under Privacy Constraints

Arxiv

0+阅读 · 2022年10月5日

The Variational Method of Moments

Arxiv

0+阅读 · 2022年10月4日

An Adaptive sampling and domain learning strategy for multivariate function approximation on unknown domains

Arxiv

0+阅读 · 2022年10月4日

Unsupervised Model Selection for Time-series Anomaly Detection

Arxiv

0+阅读 · 2022年10月3日

Predictive density estimators with integrated $L_1$ loss

Arxiv

0+阅读 · 2022年10月3日

Proportionally Fair Online Allocation of Public Goods with Predictions

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

Ensemble-based gradient inference for particle methods in optimization and sampling

Arxiv

0+阅读 · 2022年9月23日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering

Arxiv

0+阅读 · 2022年10月6日

A Fourier Approach to Mixture Learning

Arxiv

0+阅读 · 2022年10月5日

On the Statistical Complexity of Estimation and Testing under Privacy Constraints

Arxiv

0+阅读 · 2022年10月5日

The Variational Method of Moments

Arxiv

0+阅读 · 2022年10月4日

An Adaptive sampling and domain learning strategy for multivariate function approximation on unknown domains

Arxiv

0+阅读 · 2022年10月4日

Unsupervised Model Selection for Time-series Anomaly Detection

Arxiv

0+阅读 · 2022年10月3日

Predictive density estimators with integrated $L_1$ loss

Arxiv

0+阅读 · 2022年10月3日

Proportionally Fair Online Allocation of Public Goods with Predictions

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

Ensemble-based gradient inference for particle methods in optimization and sampling

Arxiv

0+阅读 · 2022年9月23日

相关基金

山豆根中具有抗肿瘤活性的Cytisine-Pterocarpan型新骨架化合物的发现及其仿生合成研究

国家自然科学基金

0+阅读 · 2015年12月31日

二亚硝基哌嗪（DNP）介导Clusterin表达参与鼻咽癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有c-Met抑制作用的新型哒嗪酮类化合物的设计、合成及其在抗肿瘤方面的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于可活化胶原蛋白结合肽的肿瘤MMP-14酶活性核素显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

四烷基膦氨基酸盐离子液体的物理化学性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于海量时空数据的城市居民移动模式研究

国家自然科学基金

0+阅读 · 2012年12月31日

太赫兹频域下极化声子研究和非线性光学材料探索

国家自然科学基金

0+阅读 · 2011年12月31日

新型闪烁晶体LuBO3:Ce的相变、生长与闪烁性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

铁电配合物的合成，结构与性质研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员