如何为集束化找到一个好的解释? (How to Find a Good Explanation for Clustering?) - 专知论文

会员服务 ·

0

决策树 · 簇 · MoDELS · CC · 异常点 ·

2021 年 12 月 13 日

How to Find a Good Explanation for Clustering?

翻译：如何为集束化找到一个好的解释?

Sayan Bandyapadhyay,Fedor Fomin,Petr Golovach,William Lochet,Nidhi Purohit,Kirill Simonov

$k$-means and $k$-median clustering are powerful unsupervised machine learning techniques. However, due to complicated dependences on all the features, it is challenging to interpret the resulting cluster assignments. Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering. In this model, a decision tree with $k$ leaves provides a straightforward characterization of the data set into clusters. We study two natural algorithmic questions about explainable clustering. (1) For a given clustering, how to find the "best explanation" by using a decision tree with $k$ leaves? (2) For a given set of points, how to find a decision tree with $k$ leaves minimizing the $k$-means/median objective of the resulting explainable clustering? To address the first question, we introduce a new model of explainable clustering. Our model, inspired by the notion of outliers in robust statistics, is the following. We are seeking a small number of points (outliers) whose removal makes the existing clustering well-explainable. For addressing the second question, we initiate the study of the model of Moshkovitz et al. from the perspective of multivariate complexity. Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.

翻译：美元汇率和美元汇率中间组合是强大的、不受监督的机器学习技术。然而,由于对所有特点的复杂依赖,解释由此而来的集束任务具有挑战性。Moshkovitz、Dasgupta、Rashtchian和Frost [ICML 2020] 提出了一个优雅的可解释美元汇率和美元汇率的模型。在这个模型中,一棵带有美元汇率叶子的决策树对数据组进行了直截了当的描述。我们研究了两个关于可解释的集束的自然逻辑性问题。 (1) 对于一个特定的集束,如何用美元汇率来找到“最佳解释”的“最佳解释”? (2) 对于一组特定点,如何用美元找到一个决策树,从而将由此而来的集束的美元汇率/中间目标降到最低? 为了解决第一个问题,我们引入了一个新的可解释的集束模式。我们基于可靠统计数据的外数概念的模型是以下的。我们正在寻找少量的点(外部)如何用美元比率来找到“最佳解释 ”, 。我们正在从一个精确的变数的变数的变数分析中开始我们目前数据组合的精度的模型。

0

相关内容

决策树

决策树(Decision Tree）是在已知各种情况发生概率的基础上，通过构成决策树来求取净现值的期望值大于等于零的概率，评价项目风险，判断其可行性的决策分析方法，是直观运用概率分析的一种图解法。由于这种决策分支画成图形很像一棵树的枝干，故称决策树。在机器学习中，决策树是一个预测模型，他代表的是对象属性与对象值之间的一种映射关系。Entropy = 系统的凌乱程度，使用算法ID3, C4.5和C5.0生成树算法使用熵。这一度量是基于信息学理论中熵的概念。决策树是一种树形结构，其中每个内部节点表示一个属性上的测试，每个分支代表一个测试输出，每个叶节点代表一种类别。分类树（决策树）是一种十分常用的分类方法。他是一种监管学习，所谓监管学习就是给定一堆样本，每个样本都有一组属性和一个类别，这些类别是事先确定的，那么通过学习得到一个分类器，这个分类器能够对新出现的对象给出正确的分类。这样的机器学习就被称之为监督学习。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知

6+阅读 · 2020年1月16日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Coresets for Kernel Clustering

Arxiv

0+阅读 · 2022年2月16日

A New Notion of Individually Fair Clustering: $α$-Equitable $k$-Center

Arxiv

0+阅读 · 2022年2月14日

What Does it Mean for a Language Model to Preserve Privacy?

Arxiv

0+阅读 · 2022年2月14日

A new measure for assessment of clustering based on kernel density estimation

Arxiv

0+阅读 · 2022年2月13日

Linear Regression, Covariate Selection and the Failure of Modelling

Arxiv

0+阅读 · 2022年2月11日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

18+阅读 · 2019年10月30日

Explainable Recommendation: A Survey and New Perspectives

Explainable Recommendation: A Survey and New Perspectives

Arxiv

66+阅读 · 2019年8月15日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Arxiv

4+阅读 · 2018年8月24日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知

6+阅读 · 2020年1月16日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Coresets for Kernel Clustering

Arxiv

0+阅读 · 2022年2月16日

A New Notion of Individually Fair Clustering: $α$-Equitable $k$-Center

Arxiv

0+阅读 · 2022年2月14日

What Does it Mean for a Language Model to Preserve Privacy?

Arxiv

0+阅读 · 2022年2月14日

A new measure for assessment of clustering based on kernel density estimation

Arxiv

0+阅读 · 2022年2月13日

Linear Regression, Covariate Selection and the Failure of Modelling

Arxiv

0+阅读 · 2022年2月11日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

18+阅读 · 2019年10月30日

Explainable Recommendation: A Survey and New Perspectives

Explainable Recommendation: A Survey and New Perspectives

Arxiv

66+阅读 · 2019年8月15日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Arxiv

4+阅读 · 2018年8月24日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

微信扫码咨询专知VIP会员