We consider Bayesian multiple statistical classification problem in the case where the unknown source distributions are estimated from the labeled training sequences, then the estimates are used as nominal distributions in a robust hypothesis test. Specifically, we employ the DGL test due to Devroye et al. and provide non-asymptotic, exponential upper bounds on the error probability of classification. The proposed upper bounds are simple to evaluate and reveal the effects of the length of the training sequences, the alphabet size and the numbers of hypothesis on the error exponent. The proposed method can also be used for large alphabet sources when the alphabet grows sub-quadratically in the length of the test sequence. The simulations indicate that the performance of the proposed method gets close to that of optimal hypothesis testing as the length of the training sequences increases.
翻译:我们认为,在根据标签的培训序列估算出未知来源分布的情况下,贝叶斯多重统计分类存在问题,然后在强有力的假设测试中将估计数用作名义分布。具体地说,我们采用DGL测试,因为Devroye等人,我们采用DGL测试,在分类的误差概率上提供非无药可治的指数性上限。拟议的上限很容易评估和揭示培训序列长度、字母大小和误差推算数的影响。在测试序列的长度中,如果字母在次赤道范围内增长,则对大字母源也使用拟议方法。模拟表明,随着培训序列长度的增加,拟议方法的性能接近于最佳假设测试的性能。