We observe a $n$-sample, the distribution of which is assumed to belong, or at least to be close enough, to a given mixture model. We propose an estimator of this distribution that belongs to our model and possesses some robustness properties with respect to a possible misspecification of it. We establish a non-asymptotic deviation bound for the Hellinger distance between the target distribution and its estimator when the model consists of a mixture of densities that belong to VC-subgraph classes. Under suitable assumptions and when the mixture model is well-specified, we derive risk bounds for the parameters of the mixture. Finally, we design a statistical procedure that allows us to select from the data the number of components as well as suitable models for each of the densities that are involved in the mixture. These models are chosen among a collection of candidate ones and we show that our selection rule combined with our estimation strategy result in an estimator which satisfies an oracle-type inequality.
翻译:我们观察的是假定属于或至少足够接近某一混合物模型的零美元样本。 我们提议了属于我们模型的这种分布的估算符, 并拥有与该模型可能存在误差有关的一些稳健性特性。 当模型由属于VC子类的密度混合组成时, 我们为目标分布与其估计符之间的 Hellinger 距离确定一个非稳健的偏差。 在适当的假设下, 当混合物模型定义明确时, 我们得出混合物参数的风险界限。 最后, 我们设计了一个统计程序, 使我们能够从数据中选择混合物所涉每个密度的部件数量和适当模型。 这些模型是在候选模型中选择的, 我们显示我们的选择规则与我们的估算策略相结合, 其结果是一个满足某种或型型不平等的估算符。