群集分析的形状复杂性 (Shape complexity in cluster analysis) - 专知论文

会员服务 ·

0

Analysis · 簇 · 聚类分析 · 缩放 · AIM ·

2022 年 9 月 5 日

Shape complexity in cluster analysis

翻译：群集分析的形状复杂性

Eduardo J. Aguilar,Valmir C. Barbosa

from arxiv, Minor improvements and fixes in this version

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called "midrange" distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.

翻译：在分组分析中,一个共同的第一步是扩大数据规模,以更好地将数据分成组群。尽管多年来为此采用了许多不同的技术,但也许可以公平地说,这一预处理阶段的工作是按每个维度的标准偏差来划分数据。与标准偏差一样,绝大多数的缩放技术可以说都源于某种统计数据。我们在这里探索如何使用多维数据形状,目的是在采用某种方法,例如K手段,明确使用样品之间的距离,来获得在集群前使用的缩放因子。我们从宇宙学和相关领域借用了最近引入的形状复杂性概念,在变量中,我们使用的是一个相对简单、数据依赖的非线性功能,我们所显示的这种功能可以用来帮助确定适当的缩放因子。我们在这里探讨的是所谓的“Midrange”距离,我们形成了一个有限的非线性编程问题,并用它来产生候选人的缩放因子集,可以根据对数据的进一步考虑进行筛选,我们通过专家知识来说明,这些是所有使用的数据结构的正面结果。

0

相关内容

Analysis

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

大鼠浅筋膜中腺苷受体与针刺镇痛相关性研究

国家自然科学基金

0+阅读 · 2014年12月31日

CLU,CR1，PICALM基因多态性及相关因素与内蒙古蒙、汉族阿尔茨海默病人群的病例-对照研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kirchhoff型拟线性Schrodinger方程及其耦合系统的非光滑变分方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

复杂体系GC-MS高通量分析方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

McKay箭图，有限复杂度自入射代数及相关课题

国家自然科学基金

0+阅读 · 2009年12月31日

基于图域几何PDE与特征不变量的离散曲面处理

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Adapting to Mixing Time in Stochastic Optimization with Markovian Data

Arxiv

0+阅读 · 2022年10月19日

A fully implicit method using nodal radial basis functions to solve the linear advection equation

Arxiv

0+阅读 · 2022年10月19日

Functional data clustering via information maximization

Arxiv

0+阅读 · 2022年10月19日

Constrained Factor Models for High-Dimensional Matrix-Variate Time Series

Arxiv

0+阅读 · 2022年10月19日

Statistical Inference for High-Dimensional Matrix-Variate Factor Model

Arxiv

0+阅读 · 2022年10月19日

Geostatistics in the presence of multivariate complexities: comparison of multi-Gaussian transforms

Arxiv

0+阅读 · 2022年10月19日

Generalization Properties of Decision Trees on Real-valued and Categorical Features

Arxiv

0+阅读 · 2022年10月18日

On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition

On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition

Arxiv

0+阅读 · 2022年10月18日

Generalizing in the Real World with Representation Learning

Arxiv

0+阅读 · 2022年10月18日

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Arxiv

0+阅读 · 2022年10月18日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Adapting to Mixing Time in Stochastic Optimization with Markovian Data

Arxiv

0+阅读 · 2022年10月19日

A fully implicit method using nodal radial basis functions to solve the linear advection equation

Arxiv

0+阅读 · 2022年10月19日

Functional data clustering via information maximization

Arxiv

0+阅读 · 2022年10月19日

Constrained Factor Models for High-Dimensional Matrix-Variate Time Series

Arxiv

0+阅读 · 2022年10月19日

Statistical Inference for High-Dimensional Matrix-Variate Factor Model

Arxiv

0+阅读 · 2022年10月19日

Geostatistics in the presence of multivariate complexities: comparison of multi-Gaussian transforms

Arxiv

0+阅读 · 2022年10月19日

Generalization Properties of Decision Trees on Real-valued and Categorical Features

Arxiv

0+阅读 · 2022年10月18日

On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition

On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition

Arxiv

0+阅读 · 2022年10月18日

Generalizing in the Real World with Representation Learning

Arxiv

0+阅读 · 2022年10月18日

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Arxiv

0+阅读 · 2022年10月18日

相关基金

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

大鼠浅筋膜中腺苷受体与针刺镇痛相关性研究

国家自然科学基金

0+阅读 · 2014年12月31日

CLU,CR1，PICALM基因多态性及相关因素与内蒙古蒙、汉族阿尔茨海默病人群的病例-对照研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kirchhoff型拟线性Schrodinger方程及其耦合系统的非光滑变分方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

复杂体系GC-MS高通量分析方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

McKay箭图，有限复杂度自入射代数及相关课题

国家自然科学基金

0+阅读 · 2009年12月31日

基于图域几何PDE与特征不变量的离散曲面处理

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员