The classical asymptotic theory for parametric $M$-estimators guarantees that, in the limit of infinite sample size, the excess risk has a chi-square type distribution, even in the misspecified case. We demonstrate how self-concordance of the loss allows to characterize the critical sample size sufficient to guarantee a chi-square type in-probability bound for the excess risk. Specifically, we consider two classes of losses: (i) self-concordant losses in the classical sense of Nesterov and Nemirovski, i.e., whose third derivative is uniformly bounded with the $3/2$ power of the second derivative; (ii) pseudo self-concordant losses, for which the power is removed. These classes contain losses corresponding to several generalized linear models, including the logistic loss and pseudo-Huber losses. Our basic result under minimal assumptions bounds the critical sample size by $O(d \cdot d_{\text{eff}}),$ where $d$ the parameter dimension and $d_{\text{eff}}$ the effective dimension that accounts for model misspecification. In contrast to the existing results, we only impose local assumptions that concern the population risk minimizer $\theta_*$. Namely, we assume that the calibrated design, i.e., design scaled by the square root of the second derivative of the loss, is subgaussian at $\theta_*$. Besides, for type-ii losses we require boundedness of a certain measure of curvature of the population risk at $\theta_*$.Our improved result bounds the critical sample size from above as $O(\max\{d_{\text{eff}}, d \log d\})$ under slightly stronger assumptions. Namely, the local assumptions must hold in the neighborhood of $\theta_*$ given by the Dikin ellipsoid of the population risk. Interestingly, we find that, for logistic regression with Gaussian design, there is no actual restriction of conditions: the subgaussian parameter and curvature measure remain near-constant over the Dikin ellipsoid. Finally, we extend some of these results to $\ell_1$-penalized estimators in high dimensions.
翻译:典型的假设性理论( may- sestimations) 保证, 在无限的样本规模范围内, 超额风险具有奇夸类型分布, 即使在错误描述的情况下也是如此。 我们展示了损失的自我协调性, 能够描述出关键样本规模, 足以保证超额风险的概率。 具体地说, 我们考虑两种损失类别:(一) Nesterov 和 Nemirovski 传统意义上的自协调性损失, 也就是,在无限样本规模范围内, 第三个衍生物与第二个衍生物的3/2美元力量一致捆绑在一起;(二) 假的自我一致性损失, 电源被去除。 这些类别包含一些通用的线性模型相应的损失, 包括物流损失和假体损失。 我们根据最低的假设将关键样本规模与 $( d) (d) d ⁇ text{fetrealtiential) 联系起来, 参数层面的美元比值比值比值小( ) 和 美元( 美元内部) 值范围内的数值比值比值比值比值比值( ), 相对值(我们发现) 相对值范围内的数值比值比值比值范围内的比值范围内的比值值值值值值值比值(我们更小) 的数值值范围) 和(我们更小) 和(我们更小) 等值范围内的数值值范围内),,, 实际的数值比值比值比值里的数值比值比值比值内值值值值值值值值值值值, 。