基于模型的群集模型, 包括失踪的非随机数据 (Model-based Clustering with Missing Not At Random Data) - 专知论文

会员服务 ·

0

簇 · Performer · Learning · 可辨认的 · 类别 ·

2023 年 2 月 15 日

Model-based Clustering with Missing Not At Random Data

翻译：基于模型的群集模型, 包括失踪的非随机数据

Aude Sportisse,Matthieu Marbac,Christophe Biernacki,Claire Boyer,Gilles Celeux,Julie Josse,Fabien Laporte

Model-based unsupervised learning, as any learning task, stalls as soon asmissing data occurs. This is even more true when the missing data are infor-mative, or said missing not at random (MNAR). In this paper, we proposemodel-based clustering algorithms designed to handle very general typesof missing data, including MNAR data. To do so, we introduce a mixturemodel for different types of data (continuous, count, categorical and mixed)to jointly model the data distribution and the MNAR mechanism, remainingvigilant to the degrees of freedom of each. Eight different MNAR modelswhich depend on the class membership and/or on the values of the missingvariables themselves are proposed. For a particular type of MNAR mod-els, for which the missingness depends on the class membership, we showthat the statistical inference can be carried out on the data matrix concate-nated with the missing mask considering a MAR mechanism instead; thisspecifically underlines the versatility of the studied MNAR models. Then,we establish sufficient conditions for identifiability of parameters of both thedata distribution and the mechanism. Regardless of the type of data and themechanism, we propose to perform clustering using EM or stochastic EMalgorithms specially developed for the purpose. Finally, we assess the nu-merical performances of the proposed methods on synthetic data and on thereal medical registry TraumaBase as well.

翻译：在任何学习任务发生时,一旦出现基于模型的无监督的学习,数据就会在任何学习任务发生时暂停。当缺失的数据是暂时的,或者说不是随机的(MNAR)时,这甚至更为正确。在本文中,我们建议采用基于模型的群集算算法,旨在处理非常一般性的缺失数据类型,包括MNAR数据。为了这样做,我们引入了不同类型数据(连续的、计数的、绝对的和混合的)的混合模型,以联合模拟数据分布和MNAR机制,保持对每个数据自由程度的警惕。然后,我们提出了八个不同的MINAR模型,这些模型取决于类成员以及/或缺失的变量本身的价值。对于某类的MNAR模型,我们提出了基于类成员缺失的模型算法。我们表明,统计推论可以在数据矩阵中进行,与缺失的掩码相连接,考虑一个MAR机制;这具体地强调了所研究的MNAR模型的多功能性。然后,我们建立了充分的条件,以便识别数据分布的参数,而我们又将数据类型和数学主题作为我们研发的模型,然后将数据类型,我们将数据类型,然后将数据类型和数学的合成的模型作为我们向最终的运行的运行的运行。

0

相关内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg 群上的 k-平面变换

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

拟南芥钙依赖型蛋白激酶CPK32响应低氮胁迫的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

As(III)在二氧化钛表面的光促吸附机制

国家自然科学基金

0+阅读 · 2012年12月31日

原子转移自由基聚合模板合成尺寸精确可控的Ge,GeO2纳米材料及应用

国家自然科学基金

0+阅读 · 2011年12月31日

含氮石墨烯作为电催化材料的构筑与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Class of Models for Large Zero-inflated Spatial Data

Arxiv

0+阅读 · 2023年4月5日

Partitioning Hypergraphs is Hard: Models, Inapproximability, and Applications

Arxiv

0+阅读 · 2023年4月5日

Many Data: Combine Experimental and Observational Data through a Power Likelihood

Arxiv

0+阅读 · 2023年4月5日

Learning from data with structured missingness

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

A Revenue Function for Comparison-Based Hierarchical Clustering

Arxiv

0+阅读 · 2023年4月2日

Factorization of Multi-Agent Sampling-Based Motion Planning

Arxiv

0+阅读 · 2023年4月1日

Efficiently transporting average treatment effects using a sufficient subset of effect modifiers

Arxiv

0+阅读 · 2023年3月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

VIP会员

文章信息

相关主题

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

数据智能体综述：新兴范式还是被高估的炒作？

海底战已至：美国构思海底安全战略 | 最新报告

【ICCV2025教程】视觉异常检测中的基础模型：进展、挑战与应用

美军将无人自主等新技术融入潜艇部队以更具杀伤力

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

A Class of Models for Large Zero-inflated Spatial Data

Arxiv

0+阅读 · 2023年4月5日

Partitioning Hypergraphs is Hard: Models, Inapproximability, and Applications

Arxiv

0+阅读 · 2023年4月5日

Many Data: Combine Experimental and Observational Data through a Power Likelihood

Arxiv

0+阅读 · 2023年4月5日

Learning from data with structured missingness

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

A Revenue Function for Comparison-Based Hierarchical Clustering

Arxiv

0+阅读 · 2023年4月2日

Factorization of Multi-Agent Sampling-Based Motion Planning

Arxiv

0+阅读 · 2023年4月1日

Efficiently transporting average treatment effects using a sufficient subset of effect modifiers

Arxiv

0+阅读 · 2023年3月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg 群上的 k-平面变换

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

拟南芥钙依赖型蛋白激酶CPK32响应低氮胁迫的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

As(III)在二氧化钛表面的光促吸附机制

国家自然科学基金

0+阅读 · 2012年12月31日

原子转移自由基聚合模板合成尺寸精确可控的Ge,GeO2纳米材料及应用

国家自然科学基金

0+阅读 · 2011年12月31日

含氮石墨烯作为电催化材料的构筑与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员