以模型为基础的内在因素模型估算师的分布性结果 (Distributional Results for Model-Based Intrinsic Dimension Estimators) - 专知论文

会员服务 ·

0

估计/估计量 · 降维 · 数据集 · 统计量 · 近邻 ·

2021 年 4 月 28 日

Distributional Results for Model-Based Intrinsic Dimension Estimators

翻译：以模型为基础的内在因素模型估算师的分布性结果

Francesco Denti,Diego Doimo,Alessandro Laio,Antonietta Mira

Modern datasets are characterized by a large number of features that may conceal complex dependency structures. To deal with this type of data, dimensionality reduction techniques are essential. Numerous dimensionality reduction methods rely on the concept of intrinsic dimension, a measure of the complexity of the dataset. In this article, we first review the TWO-NN model, a likelihood-based intrinsic dimension estimator recently introduced in the literature. The TWO-NN estimator is based on the statistical properties of the ratio of the distances between a point and its first two nearest neighbors, assuming that the points are a realization from an homogeneous Poisson point process. We extend the TWO-NN theoretical framework by providing novel distributional results of consecutive and generic ratios of distances. These distributional results are then employed to derive intrinsic dimension estimators, called Cride and Gride. These novel estimators are more robust to noisy measurements than the TWO-NN and allow the study of the evolution of the intrinsic dimension as a function of the scale used to analyze the dataset. We discuss the properties of the different estimators with the help of simulation scenarios.

翻译：现代数据集具有许多特征,这些特征可能隐藏复杂的依赖结构。在处理这类数据时,维度减少技术是必不可少的。许多维度减少方法依赖于内在维度的概念,这是衡量数据集复杂性的一个尺度。在本篇文章中,我们首先审查基于可能性的内在维度估计器二-NN模型,这是文献中最近引入的一种基于可能性的内在维度估计器。 2-NE 估计器基于一个点与其前两个近邻之间距离的统计属性,假设这些点是从同质 Poisson 点进程中实现的。我们通过提供连续和通用距离比重的新分布结果来扩展二-NNN理论框架。然后,这些分布结果被用来产生内在维度估计器,称为Cride和Gride。这些新的估计器比 2-NN 更能进行噪音测量,并允许研究内在维度的演变,作为分析数据集的尺度的函数。我们讨论不同估计器的特性,并借助模拟假设。

0

相关内容

估计/估计量

估计/估计量

【干货书】代数计算导论，419页pdf

专知会员服务

78+阅读 · 2021年5月11日

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

专知会员服务

106+阅读 · 2021年2月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Distribution Free Uncertainty for the Minimum Norm Solution of Over-parameterized Linear Regression

Arxiv

0+阅读 · 2021年6月17日

A Two-Stage Bayesian Semiparametric Model for Novelty Detection with Robust Prior Information

Arxiv

0+阅读 · 2021年6月17日

Field trial on Ocean Estimation for Multi-Vessel Multi-Float-based Active perception

Arxiv

0+阅读 · 2021年6月17日

Linear Classifiers in Product Space Forms

Arxiv

0+阅读 · 2021年6月16日

Covariance Matrix Estimation with Non Uniform and Data Dependent Missing Observations

Arxiv

0+阅读 · 2021年6月16日

Optimal sampling for design-based estimators of regression models

Arxiv

0+阅读 · 2021年6月16日

Multi-sample estimation of centered log-ratio matrix in microbiome studies

Arxiv

0+阅读 · 2021年6月15日

Sparse Regression for Extreme Values

Arxiv

0+阅读 · 2021年6月14日

Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality

Arxiv

0+阅读 · 2021年6月14日

Latent nested nonparametric priors

Arxiv

4+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【干货书】代数计算导论，419页pdf

专知会员服务

78+阅读 · 2021年5月11日

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

专知会员服务

106+阅读 · 2021年2月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Distribution Free Uncertainty for the Minimum Norm Solution of Over-parameterized Linear Regression

Arxiv

0+阅读 · 2021年6月17日

A Two-Stage Bayesian Semiparametric Model for Novelty Detection with Robust Prior Information

Arxiv

0+阅读 · 2021年6月17日

Field trial on Ocean Estimation for Multi-Vessel Multi-Float-based Active perception

Arxiv

0+阅读 · 2021年6月17日

Linear Classifiers in Product Space Forms

Arxiv

0+阅读 · 2021年6月16日

Covariance Matrix Estimation with Non Uniform and Data Dependent Missing Observations

Arxiv

0+阅读 · 2021年6月16日

Optimal sampling for design-based estimators of regression models

Arxiv

0+阅读 · 2021年6月16日

Multi-sample estimation of centered log-ratio matrix in microbiome studies

Arxiv

0+阅读 · 2021年6月15日

Sparse Regression for Extreme Values

Arxiv

0+阅读 · 2021年6月14日

Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality

Arxiv

0+阅读 · 2021年6月14日

Latent nested nonparametric priors

Arxiv

4+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员