核实关于审定数据的群集分析结果:系统框架 (Validation of cluster analysis results on validation data: A systematic framework) - 专知论文

会员服务 ·

0

簇 · 聚类分析 · 数据集 · Extensibility · 查准率/准确率 ·

2022 年 1 月 10 日

Validation of cluster analysis results on validation data: A systematic framework

翻译：核实关于审定数据的群集分析结果:系统框架

Theresa Ullmann,Christian Hennig,Anne-Laure Boulesteix

from arxiv, 32 pages, 1 figure

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To judge the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic structured framework for validating clustering results on validation data that includes most existing validation approaches. In particular, we review classical validation techniques such as internal and external validation, stability analysis, hypothesis testing, and visual validation, and show how they can be interpreted in terms of our framework. We precisely define and formalise different types of validation of clustering results on a validation dataset and explain how each type can be implemented in practice. Furthermore, we give examples of how clustering studies from the applied literature that used a validation dataset can be classified into the framework.

翻译：集群分析是指一系列广泛的分类发现数据分析技术,在许多应用领域很受欢迎。为判断集群结果的质量,文献中提出了不同的群集验证程序。虽然在传统验证技术方面做了大量工作,例如内部和外部验证,但较少注意使用验证数据集验证和复制集群结果。这种数据集可能是原始数据集的一部分,该数据集在分析开始之前是分开的,也可能是独立收集的数据集。我们提出了一个系统化的结构化框架,用以验证包括大多数现有验证方法在内的验证数据组合结果。我们特别审查了传统的验证技术,例如内部和外部验证、稳定性分析、假设测试和视觉验证,并表明如何用框架来解释这些技术。我们精确地界定和正式确定验证数据集上分类结果的不同类型,并解释如何在实践中执行每种类型。此外,我们举例说明如何将使用验证数据集的应用文献的集群研究分类为框架。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

可扩展的蛋白质组学大数据存储与分析模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于机器学习的蛋白质翻译后修饰位点预测的研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于缺失属性值区间型描述的不完备数据聚类方法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

冗余字典下的压缩感知理论及应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于压缩感知,矩阵填充和鲁棒的主成分分析的四元数信号处理方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

《软件学报》学术期刊

国家自然科学基金

6+阅读 · 2011年12月31日

磷酸化修饰介导的蛋白质相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

A Survey of Video-based Action Quality Assessment

Arxiv

0+阅读 · 2022年4月20日

Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Arxiv

0+阅读 · 2022年4月19日

Selection of proposal distributions for multiple importance sampling

Arxiv

0+阅读 · 2022年4月18日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

A general framework for identification of permissible variable subsets and development of structured variable selection methods

Arxiv

0+阅读 · 2022年4月14日

Trustworthy AI: From Principles to Practices

Arxiv

46+阅读 · 2021年10月4日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维和高维空间中分析、建模和转换潜在表征

从无人机到数据：揭示边缘计算作为新作战域

可解释人工智能的基础

大规模视觉模型中的基于提示的适应：综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

A Survey of Video-based Action Quality Assessment

Arxiv

0+阅读 · 2022年4月20日

Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Arxiv

0+阅读 · 2022年4月19日

Selection of proposal distributions for multiple importance sampling

Arxiv

0+阅读 · 2022年4月18日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

A general framework for identification of permissible variable subsets and development of structured variable selection methods

Arxiv

0+阅读 · 2022年4月14日

Trustworthy AI: From Principles to Practices

Arxiv

46+阅读 · 2021年10月4日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

相关基金

可扩展的蛋白质组学大数据存储与分析模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于机器学习的蛋白质翻译后修饰位点预测的研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于缺失属性值区间型描述的不完备数据聚类方法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

冗余字典下的压缩感知理论及应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于压缩感知,矩阵填充和鲁棒的主成分分析的四元数信号处理方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

《软件学报》学术期刊

国家自然科学基金

6+阅读 · 2011年12月31日

磷酸化修饰介导的蛋白质相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员