有限混合物模型无法可靠地了解成分数量 (Finite mixture models do not reliably learn the number of components)

Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. A common suggestion is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM component-count posterior is consistent; that is, the posterior concentrates on the true, generating number of components. But consistency requires the assumption that the component likelihoods are perfectly specified, which is unrealistic in practice. In this paper, we add rigor to data-analysis folk wisdom by proving that under even the slightest model misspecification, the FMM component-count posterior diverges: the posterior probability of any particular finite number of components converges to 0 in the limit of infinite data. Contrary to intuition, posterior-density consistency is not sufficient to establish this result. We develop novel sufficient conditions that are more realistic and easily checkable than those common in the asymptotics literature. We illustrate practical consequences of our theory on simulated and real data.

翻译：科学家和工程师往往有兴趣了解数据集中存在的亚群数(或组成部分)的数量。一个共同的建议是使用一个具有先验成分数量的有限混合模型(FMM),先验的成分数量。过去的工作表明,由此得出的FMM的成分计数后背体是一致的;也就是说,后成体集中在真实的成分上,产生大量成分。但一致性要求假设组成部分的可能性是完全具体的,在实践中是不切实际的。在本文中,我们通过证明在即使是最微小的模型中,FMM的成分计数后表层差异:在无限数据限度内,任何特定有限成分数的后验概率会达到0。与直觉相反,后成体密度的连贯性并不足以确立这一结果。我们开发出比非典型文献中常见的更现实和容易核对的新的充分条件。我们举例说明了我们关于模拟和真实数据理论的实际后果。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日