Parameter estimation of mixture regression model using the expectation maximization (EM) algorithm is highly sensitive to outliers. Here we propose a fast and efficient robust mixture regression algorithm, called Component-wise Adaptive Trimming (CAT) method. We consider simultaneous outlier detection and robust parameter estimation to minimize the effect of outlier contamination. Robust mixture regression has many important applications including in human cancer genomics data, where the population often displays strong heterogeneity added by unwanted technological perturbations. Existing robust mixture regression methods suffer from outliers as they either conduct parameter estimation in the presence of outliers, or rely on prior knowledge of the level of outlier contamination. CAT was implemented in the framework of classification expectation maximization, under which a natural definition of outliers could be derived. It implements a least trimmed squares (LTS) approach within each exclusive mixing component, where the robustness issue could be transformed from the mixture case to simple linear regression case. The high breakdown point of the LTS approach allows us to avoid the pre-specification of trimming parameter. Compared with multiple existing algorithms, CAT is the most competitive one that can handle and adaptively trim off outliers as well as heavy tailed noise, in different scenarios of simulated data and real genomic data. CAT has been implemented in an R package `RobMixReg' available in CRAN.
翻译:使用预期最大化(EM)算法对混合物回归模型使用混合物回归模型的参数估计对外部线非常敏感。 我们在此提出一种快速而高效的稳健混合物回归算法,称为 " 构件与适应性适应性三角(CAT)法 " 。 我们考虑同时进行外部探测和稳健的参数估计,以尽量减少外部污染的影响。 强性混合物回归有许多重要的应用,包括在人类癌症基因组数据中,那里的人群往往表现出由不想要的技术扰动所增加的强烈异质性。 现有的稳健混合物回归方法受到外部线的损害,因为它们要么在外部线外线下进行参数估计,要么依靠以前对外部污染程度的了解。 CAT是在分类预期最大化的框架内实施的,在这个框架内可以对外部线性污染进行自然定义,从而将外部线性污染的影响降到最低。 CAT在每一个排他性混合组件中都采用了最小的三角方(LTS)方法, 稳性问题从混合体中变成简单的线性回归案例。 LTSTS的高度分解点使我们避免了三毫米参数的预确定。 与多种现有RA值相比,CAT是最有竞争力的磁体的模型数据,在模拟的模拟的模拟中可以进行最有弹性的模拟的磁体中, 。