Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of "Curse of dimensionality" on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the "FlexCWM" R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.
翻译:与许多机器学习模型相似,集群加权模型(CWM)的精度和速度可能受到高维数据的阻碍,从而导致以前关于减少“维度诅咒”对混合模型的影响的模糊技术的工程。我们在此工作中审查了集束加权模型的背景研究。我们进一步表明,光是粘度技术不足以使混合模型在巨大的高维数据面前蓬勃发展。我们讨论的是,通过使用“FlexCWM”R包中的默认值选择位置参数的初始值来探测隐藏部件的杂交技术。我们引入了一种称为T分布式随机相邻嵌入(TSNEE)的维度减少技术,以加强高维空间的粘度CWMS。最初,CWMS适合回归,但为了分类目的,所有多级变量都随着一些噪音而改变逻辑。通过预期最大化算法获得模型的参数。讨论过的技术的有效性通过不同领域的真实数据集得到证明。