If the probability distribution model aims to approximate the hidden mother distribution, it is imperative to establish a useful criterion for the resemblance between the mother and the model distributions. This study proposes a criterion that measures the Hellinger distance between discretized (quantized) samples from both distributions. Unlike information criteria such as AIC, this criterion does not require the probability density function of the model distribution, which cannot be explicitly obtained for a complicated model such as a deep learning machine. Second, it can draw a positive conclusion (i.e., both distributions are sufficiently close) under a given threshold, whereas a statistical hypothesis test, such as the Kolmogorov-Smirnov test, cannot genuinely lead to a positive conclusion when the hypothesis is accepted. In this study, we establish a reasonable threshold for the criterion deduced from the Bayes error rate and also present the asymptotic bias of the estimator of the criterion. From these results, a reasonable and easy-to-use criterion is established that can be directly calculated from the two sets of samples from both distributions.
翻译:如果概率分布模型旨在估计母体的隐藏分布,那么就必须为母体和模型分布的相似性确定一个有用的标准。本研究提出了一个标准,用以衡量两种分布的离散(定量)样品之间的海灵格距离。与AIC等信息标准不同,这一标准并不要求模型分布的概率密度功能,而对于深层学习机等复杂模型,则无法明确获得这种概率密度功能。第二,它可以在某一阈值下得出一个肯定的结论(即两种分布都足够接近),而统计假设测试,如Kolmogorov-Smirnov测试,在接受假设时,不能真正得出肯定的结论。在本研究中,我们为从海湾误差率中得出的标准设定了一个合理的阈值,并展示了标准估计器的无谓偏差。从这些结果中可以确定一个合理和容易使用的标准,可以直接从两种分布的样品中计算出来。