大数据多维缩放 (Multidimensional Scaling for Big Data) - 专知论文

会员服务 ·

0

FAST · 缩放 · 原点 · Performer · 统计量 ·

2021 年 7 月 30 日

Multidimensional Scaling for Big Data

翻译：大数据多维缩放

Pedro Delicado,Cristian Pachon-Garcia

We present a set of algorithms for Multidimensional Scaling (MDS) to be used with large datasets. MDS is a statistic tool for reduction of dimensionality, using as input a distance matrix of dimensions $n \times n$. When $n$ is large, classical algorithms suffer from computational problems and MDS configuration can not be obtained. In this paper we address these problems by means of three algorithms: Divide and Conquer MDS, Fast MDS and MDS based on Gower interpolation (the first and the last being original proposals). The main ideas of these methods are based on partitioning the dataset into small pieces, where classical MDS methods can work. In order to check the performance of the algorithms as well as to compare them, we do a simulation study. This study points out that Fast MDS and MDS based on Gower interpolation are appropriated to use when $n$ is large. Although Divide and Conquer MDS is not as fast as the other two algorithms, it is the best method that captures the variance of the original data.

翻译：我们提出了一套用于大型数据集的多层面增强(MDS)的算法。 MDS是用于减少维度的统计工具,它使用一个维度的距离矩阵输入 $ 美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元美元的经典算法美元美元美元美元美元美元美元美元美元的 MDS 配置无法获得 MDS 。在本文件中,我们通过三种算法来解决这些问题: 分解和征服 MDS 、快速 MDS 和 MDS 以 Gower 的内插法为基础。这些方法的主要想法是基于将数据集分割成小块, 经典 MDS 方法可以发挥作用。为了检查这些算法的性能和比较它们, 我们做一个模拟研究。这项研究指出, 以 Gower 美元的快速 MDS 和 MDS 美元美元以美元美元美元美元用于美元。虽然分解和 Conquerque MDS 不是和两种算法那么快, 但是, 但这是最佳的方法来捕捉取原始数据。

0

相关内容

FAST

FAST：Conference on File and Storage Technologies。 Explanation：文件和存储技术会议。 Publisher：USENIX。 SIT:http://dblp.uni-trier.de/db/conf/fast/

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

专知会员服务

9+阅读 · 2020年5月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

专知会员服务

19+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

12+阅读 · 2019年7月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Improved quantum lower and upper bounds for matrix scaling

Arxiv

0+阅读 · 2021年9月30日

Canonical thresholding for non-sparse high-dimensional linear regression

Arxiv

0+阅读 · 2021年9月30日

Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis

Arxiv

0+阅读 · 2021年9月30日

Private sampling: a noiseless approach for generating differentially private synthetic data

Arxiv

0+阅读 · 2021年9月30日

Greedy algorithms for learning via exponential-polynomial splines

Arxiv

0+阅读 · 2021年9月29日

Anomaly Detection for High-Dimensional Data Using Large Deviations Principle

Arxiv

0+阅读 · 2021年9月28日

Design of quantum optical experiments with logic artificial intelligence

Arxiv

0+阅读 · 2021年9月27日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

专知会员服务

9+阅读 · 2020年5月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

专知会员服务

19+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

已删除

将门创投

12+阅读 · 2019年7月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Improved quantum lower and upper bounds for matrix scaling

Arxiv

0+阅读 · 2021年9月30日

Canonical thresholding for non-sparse high-dimensional linear regression

Arxiv

0+阅读 · 2021年9月30日

Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis

Arxiv

0+阅读 · 2021年9月30日

Private sampling: a noiseless approach for generating differentially private synthetic data

Arxiv

0+阅读 · 2021年9月30日

Greedy algorithms for learning via exponential-polynomial splines

Arxiv

0+阅读 · 2021年9月29日

Anomaly Detection for High-Dimensional Data Using Large Deviations Principle

Arxiv

0+阅读 · 2021年9月28日

Design of quantum optical experiments with logic artificial intelligence

Arxiv

0+阅读 · 2021年9月27日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

微信扫码咨询专知VIP会员