The problem of robust hypothesis testing is studied, where under the null and the alternative hypotheses, the data-generating distributions are assumed to be in some uncertainty sets, and the goal is to design a test that performs well under the worst-case distributions over the uncertainty sets. In this paper, uncertainty sets are constructed in a data-driven manner using kernel method, i.e., they are centered around empirical distributions of training samples from the null and alternative hypotheses, respectively; and are constrained via the distance between kernel mean embeddings of distributions in the reproducing kernel Hilbert space, i.e., maximum mean discrepancy (MMD). The Bayesian setting and the Neyman-Pearson setting are investigated. For the Bayesian setting where the goal is to minimize the worst-case error probability, an optimal test is firstly obtained when the alphabet is finite. When the alphabet is infinite, a tractable approximation is proposed to quantify the worst-case average error probability, and a kernel smoothing method is further applied to design test that generalizes to unseen samples. A direct robust kernel test is also proposed and proved to be exponentially consistent. For the Neyman-Pearson setting, where the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm, an efficient robust kernel test is proposed and is shown to be asymptotically optimal. Numerical results are provided to demonstrate the performance of the proposed robust tests.
翻译:研究了鲁棒性假设检验,其中在零假设和备择假设下,数据生成的分布被假定为一些不确定性集合,并且目标是设计一个在最坏情况下数据分布的检验。本文利用核方法构建了不确定性集合,即以来自零假设和备择假设的训练样本的经验分布为中心的集合,并通过核均值嵌入的距离约束,例如 maximum mean discrepancy (MMD)。研究了贝叶斯和Neyman-Pearson两种情况。对于贝叶斯的情况,最初获得了在字母表是有限的情况下最小化最坏情况下错误概率的最优检验。当字母表是无限的时候,提出了一个可行的近似方法来量化最坏情况下平均错误概率,并进一步应用了核平滑方法来设计适用于未见过样本的检验。还提出了一个直接的鲁棒性核检验,并被证明是指数一致的。对于Neyman-Pearson的情况,目标是在最坏情况下漏检概率的约束下最小化误报概率,并提出了一种高效的鲁棒性核检验,并且被证明是渐进最优的。提供了数值结果来证明所提出的鲁棒检验的性能。