For a parametric model of distributions, the closest distribution in the model to the true distribution located outside the model is considered. Measuring the closeness between two distributions with the Kullback-Leibler (K-L) divergence, the closest distribution is called the "information projection." The estimation risk of the maximum likelihood estimator (MLE) is defined as the expectation of K-L divergence between the information projection and the predictive distribution with plugged-in MLE. Here, the asymptotic expansion of the risk is derived up to $n^{-2}$-order, and the sufficient condition on the risk for the Bayes error rate between the true distribution and the information projection to be lower than a specified value is investigated. Combining these results, the "$p-n$ criterion" is proposed, which determines whether the MLE is sufficiently close to the information projection for the given model and sample. In particular, the criterion for an exponential family model is relatively simple and can be used for a complex model with no explicit form of normalizing constant. This criterion can constitute a solution to the sample size or model acceptance problem. Use of the $p-n$ criteria is demonstrated for two practical datasets. The relationship between the results and information criteria is also studied.
翻译:对于分布的参数模型,将考虑模型中最接近于模型外真实分布的分布模式。测量与 Kullback- Leiberr (K-L) 差差的两种分布之间的近距离,最接近的分布被称为“ 信息预测”。 最大可能性估计值(MLE)的估计风险被定义为信息预测与插插入MLE的预测分布之间K-L差的预期差。 这里,风险的无症状扩展可得出最高为$ ⁇ -2} 美元顺序,以及真实分布与信息预测低于特定值的贝斯错误率的充分条件。 将这些结果结合起来, 提出“ $p- n$ 标准 ”, 确定最大可能性估计值是否与特定模型和样本的信息预测相近。 特别是, 指数型家庭模型的标准相对简单, 可用于没有明确形式正常的复杂模型。 这一标准可以构成样本大小或模型接受率问题的解决办法。 使用 美元- 和 数据 所研究的 数据 标准是 。