In this letter, we consider multiple statistical classification problem where a sequence of n independent and identically distributed observations, that are generated by one of M discrete sources, need to be classified. The source distributions are not known, however one has access to labeled training sequences, of length N, from each source. We consider the case where the unknown source distributions are estimated from the training sequences, then the estimates are used as nominal distributions in a robust hypothesis test. Specifically, we consider the robust DGL test due to Devroye et al. and provide non-asymptotic exponential bounds, that are functions of N{n, on the error probability of classification.
翻译:在本信中,我们考虑了多种统计分类问题,因为需要分类由M离散来源之一产生的独立和相同分布的观测序列。来源分布不详,但每个来源都可获得标记的培训序列(N长度),但每个来源都有N长度。我们考虑了从培训序列中估算出未知来源分布的情况,然后在可靠的假设测试中将估计数用作名义分布。具体地说,我们认为,由于Devroye等人(DGL)测试的结果,DGL测试是稳健的,提供了非抽取指数界限,这是N{n(n)在分类误差概率方面的功能。