Suppose that we have two training sequences generated by parametrized distributions $P_{\theta^*}$ and $P_{\xi^*}$, where $\theta^*$ and $\xi^*$ are unknown true parameters. Given training sequences, we study the problem of classifying whether a test sequence was generated according to $P_{\theta^*}$ or $P_{\xi^*}$. This problem can be thought of as a hypothesis testing problem and our aim is to analyze the weighted sum of type-I and type-II error probabilities. Utilizing the analysis of the codeword lengths of the Bayes code, our previous study derived more refined bounds on the error probability than known previously. However, our previous study had the following deficiencies: i) the prior distributions of $\theta$ and $\xi$ are the same; ii) the prior distributions of two hypotheses are uniform; iii) no numerical calculation at finite blocklength. This study solves these problems. We remove the restrictions i) and ii) and derive more general results than obtained previously. To deal with problem iii), we perform a numerical calculation for a concrete model.
翻译:假设我们有两个培训序列, 由超位分配 $P ⁇ theta $和 $P ⁇ xi $产生, 美元和 $xi $是未知的真实参数。 然而, 我们根据培训序列, 我们研究一个问题, 即测试序列是按 $P ⁇ theta $还是 $P ⁇ xi $进行分类。 这个问题可以被认为是一个假设测试问题, 我们的目标是分析一型和二型错误概率的加权和数值。 我们利用对贝斯代码编码长度的分析, 我们先前的研究得出了比以前更精确的误差概率界限。 但是, 我们先前的研究有以下缺陷: i) 美元和 美元之前的误差分布是相同的; ii) 前两个假设的分布是统一的; iii) 没有在限定的块长度上进行数字计算。 本研究解决了这些问题。 我们取消了限制 i) 和ii), 并得出比以前得到的更一般的结果。 为了解决问题, 我们做了一个具体模型的数值计算。 (iii) 。