对指数式家庭回归函数的有力估计 (Robust estimation of a regression function in exponential families)

We observe $n$ pairs of independent random variables $X_{1}=(W_{1},Y_{1}),\ldots,X_{n}=(W_{n},Y_{n})$ and assume, although this might not be true, that for each $i\in\{1,\ldots,n\}$, the conditional distribution of $Y_{i}$ given $W_{i}$ belongs to a given exponential family with real parameter $\theta_{i}^{\star}=\boldsymbol{\theta}^{\star}(W_{i})$ the value of which is an unknown function $\boldsymbol{\theta}^{\star}$ of the covariate $W_{i}$. Given a model $\boldsymbol{\overline\Theta}$ for $\boldsymbol{\theta}^{\star}$, we propose an estimator $\boldsymbol{\widehat \theta}$ with values in $\boldsymbol{\overline\Theta}$ the construction of which is independent of the distribution of the $W_{i}$. We show that $\boldsymbol{\widehat \theta}$ possesses the properties of being robust to contamination, outliers and model misspecification. We establish non-asymptotic exponential inequalities for the upper deviations of a Hellinger-type distance between the true distribution of the data and the estimated one based on $\boldsymbol{\widehat \theta}$. We deduce a uniform risk bound for $\boldsymbol{\widehat \theta}$ over the class of H\"olderian functions and we prove the optimality of this bound up to a logarithmic factor. Finally, we provide an algorithm for calculating $\boldsymbol{\widehat \theta}$ when $\boldsymbol{\theta}^{\star}$ is assumed to belong to functional classes of low or medium dimensions (in a suitable sense) and, on a simulation study, we compare the performance of $\boldsymbol{\widehat \theta}$ to that of the MLE and median-based estimators. The proof of our main result relies on an upper bound, with explicit numerical constants, on the expectation of the supremum of an empirical process over a VC-subgraph class. This bound can be of independent interest.

翻译：我们观察了独立的随机变量$X ⁇ 1 ⁇ 1 ⁇ (W ⁇ 1},Y ⁇ 1})的美元配方,并且假设,虽然这也许并不是真的,但对于每个美元1 ⁇ 1,\oldot,美元的有条件分配是给一个具有真实参数的指数式家族的 $T ⁇ i ⁇ star_star_staryball_star}(W ⁇ 1}) 美元, 美元的价值是未知的美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元) 美元(美元)