We solve the problem of estimating the distribution of presumed i.i.d. observations for the total variation loss. Our approach is based on density models and is versatile enough to cope with many different ones, including some density models for which the Maximum Likelihood Estimator (MLE for short) does not exist. We mainly illustrate the properties of our estimator on models of densities on the line that satisfy a shape constraint. We show that it possesses some similar optimality properties, with regard to some global rates of convergence, as the MLE does when it exists. It also enjoys some adaptation properties with respect to some specific target densities in the model for which our estimator is proven to converge at parametric rate. More important is the fact that our estimator is robust, not only with respect to model misspecification, but also to contamination, the presence of outliers among the dataset and the equidistribution assumption. This means that the estimator performs almost as well as if the data were i.i.d. with density $p$ in a situation where these data are only independent and most of their marginals are close enough in total variation to a distribution with density $p$. Our main result on the risk of the estimator takes the form of an exponential deviation inequality which is non-asymptotic and involves explicit numerical constants. We deduce from it several global rates of convergence, including some bounds for the minimax $\mathbb{L}_{1}$-risks over the sets of concave and log-concave densities. These bounds derive from some specific results on the approximation of densities which are monotone, convex, concave and log-concave. Such results may be of independent interest.
翻译:我们解决了估计假设的 i. d. 观察总变差损失的分布问题。 我们的方法以密度模型为基础, 并且具有适应性, 足以应对许多不同的模型, 包括一些密度模型, 不存在最大隐隐隐模拟器( MLE 用于短时间) 。 我们主要展示了线上密度模型的估测器的特性, 这些模型满足了形状限制。 我们显示它在某些全球趋同率方面具有相似的最佳性, 如 MLE 存在时那样 。 在模型中某些特定的目标密度方面, 我们的估测器被证明以参数速率趋同。 更重要的是, 我们的估测器是坚固的, 不仅在模型的误差方面, 而且还在污染方面, 数据集中存在外部异端值。 这意味着, 估测器与某些全球趋同率( MI. i. d. ) 相类似, 在这种情况下, 以美元密度为密度为密度, 我们的直径直径直径直径直值的直径直径直径直径直径直径直值结果。