T-SNE可视化高多层次集群数据T-SNE理论基础 (Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data) - 专知论文

会员服务 ·

0

簇 · 随机近邻嵌入 · 可辨认的 · 早停 · Guidance ·

2021 年 5 月 18 日

Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

翻译：T-SNE可视化高多层次集群数据T-SNE理论基础

T. Tony Cai,Rong Ma

This study investigates the theoretical foundations of t-distributed stochastic neighbor embedding (t-SNE), a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based on the gradient descent approach is presented. For the early exaggeration stage of t-SNE, we show its asymptotic equivalence to a power iteration based on the underlying graph Laplacian, characterize its limiting behavior, and uncover its deep connection to Laplacian spectral clustering, and fundamental principles including early stopping as implicit regularization. The results explain the intrinsic mechanism and the empirical benefits of such a computational strategy. For the embedding stage of t-SNE, we characterize the kinematics of the low-dimensional map throughout the iterations, and identify an amplification phase, featuring the intercluster repulsion and the expansive behavior of the low-dimensional map. The general theory explains the fast convergence rate and the exceptional empirical performance of t-SNE for visualizing clustered data, brings forth the interpretations of the t-SNE output, and provides theoretical guidance for selecting tuning parameters in various applications.

翻译：本研究调查了T-分布式随机邻居嵌入(t-SNE)的理论基础,这是一种流行的非线性尺寸减少和数据可视化方法。介绍了基于梯度下移法分析t-SNE的新理论框架。对于t-SNE早期的夸大阶段,我们展示了它与基于基本图Laplaceian的动力迭代的无症状等同性,其限制行为的特点,并揭示了它与Laplaceian光谱集群的深层联系,以及基本原则,包括早期停止作为隐含的正规化。结果解释了这种计算战略的内在机制和实证效益。对于t-SNE的嵌入阶段,我们给整个迭代中低维图的动态定了特征,并确定了一个振动阶段,其特点是集群间反作用和低维图的外延行为。一般理论解释了T-SNE的快速趋同率和特异的经验性性性表现,用以对集群数据进行可视化,提出了对T-SNE输出应用的各种参数的诠释。

0

相关内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【2020新书】Kafka实战：Kafka in Action，209页pdf

【2020新书】Kafka实战：Kafka in Action，209页pdf

专知会员服务

69+阅读 · 2020年3月9日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation

Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation

Arxiv

0+阅读 · 2021年7月8日

ENNS: Variable Selection, Regression, Classification and Deep Neural Network for High-Dimensional Data

Arxiv

0+阅读 · 2021年7月7日

Stability and convergence analysis of a domain decomposition FE/FD method for the Maxwell's equations in time domain

Stability and convergence analysis of a domain decomposition FE/FD method for the Maxwell's equations in time domain

Arxiv

0+阅读 · 2021年7月7日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2021年7月6日

Hierarchical clustered multiclass discriminant analysis via cross-validation

Arxiv

0+阅读 · 2021年7月5日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Arxiv

6+阅读 · 2020年6月15日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

Twitter Sentiment Analysis

Arxiv

5+阅读 · 2015年9月14日

VIP会员

文章信息

相关主题

随机近邻嵌入

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【2020新书】Kafka实战：Kafka in Action，209页pdf

【2020新书】Kafka实战：Kafka in Action，209页pdf

专知会员服务

69+阅读 · 2020年3月9日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

Deep Research（深度研究）：系统性综述

《革新战术战场空间能力：反无人机系统》报告

【普林斯顿博士论文】用于语音的生成式通用模型

螺旋式开发作为战略资产：美军启示

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation

Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation

Arxiv

0+阅读 · 2021年7月8日

ENNS: Variable Selection, Regression, Classification and Deep Neural Network for High-Dimensional Data

Arxiv

0+阅读 · 2021年7月7日

Stability and convergence analysis of a domain decomposition FE/FD method for the Maxwell's equations in time domain

Stability and convergence analysis of a domain decomposition FE/FD method for the Maxwell's equations in time domain

Arxiv

0+阅读 · 2021年7月7日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2021年7月6日

Hierarchical clustered multiclass discriminant analysis via cross-validation

Arxiv

0+阅读 · 2021年7月5日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Arxiv

6+阅读 · 2020年6月15日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

Twitter Sentiment Analysis

Arxiv

5+阅读 · 2015年9月14日

微信扫码咨询专知VIP会员