Finite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e. the allocation of data points to the clusters is made independently of the distribution of the covariates. In order to take into account the latter aspect, finite mixtures of regressions with random covariates, also known as cluster-weighted models (CWMs), have been proposed in the univariate and multivariate literature. In this paper, the CWM is extended to matrix data, e.g. those data where a set of variables are simultaneously observed at different time points or locations. Specifically, the cluster-specific marginal distribution of the covariates, and the cluster-specific conditional distribution of the responses given the covariates, are assumed to be matrix normal. Maximum likelihood parameter estimates are derived using an ECM algorithm. Parameter recovery, classification assessment and the capability of the BIC to detect the underlying groups are analyzed on simulated data. Finally, two real data applications concerning educational indicators and the Italian non-life insurance market are presented.
翻译:固定共变回归的有限混合物是处理回归数据的一种常用的基于模型的集群方法,但是,它们承担了分配独立,即将数据点分配给各组的情况与共变的分布分开。为了考虑到后一方面,在单体和多变量文献中提出了与随机共变(又称集重模型)的有限回归混合物(CWMs),在单体和多变量文献中提出了与随机共变(又称集重模型(CWMs)的有限回归混合物。在本文中,CWM扩展至矩阵数据,例如,在不同时间点或地点同时观测一组变量的数据。具体地说,共变数的集点边际分布和特定组别对答复的有条件分布假定为矩阵正常。使用ECM算法得出了最大可能性参数估计值。参数的恢复、分类评估和BIC检测基础组的能力根据模拟数据进行了分析。最后,介绍了两个与教育指标和意大利非生命保险市场有关的实际数据应用。