Likelihood-to-evidence ratio estimation is usually cast as either a binary (NRE-A) or a multiclass (NRE-B) classification task. In contrast to the binary classification framework, the current formulation of the multiclass version has an intrinsic and unknown bias term, making otherwise informative diagnostics unreliable. We propose a multiclass framework free from the bias inherent to NRE-B at optimum, leaving us in the position to run diagnostics that practitioners depend on. It also recovers NRE-A in one corner case and NRE-B in the limiting case. For fair comparison, we benchmark the behavior of all algorithms in both familiar and novel training regimes: when jointly drawn data is unlimited, when data is fixed but prior draws are unlimited, and in the commonplace fixed data and parameters setting. Our investigations reveal that the highest performing models are distant from the competitors (NRE-A, NRE-B) in hyperparameter space. We make a recommendation for hyperparameters distinct from the previous models. We suggest a bound on the mutual information as a performance metric for simulation-based inference methods, without the need for posterior samples, and provide experimental results.
翻译:与二进制分类框架相反,目前多类版本的提法具有内在和未知的偏差性,使得信息性诊断不可靠。我们建议了一个没有NRE-B固有偏差的多级框架,使我们可以运行执业者依赖的诊断。它还在一个角落中恢复了NRE-A,在限制案例中恢复了NRE-A。为了公平比较,我们把所有算法在熟悉和新颖的培训制度中的行为都作为基准:当联合生成的数据是无限的,当数据是固定的,但先前的提法是无限的,在通用的固定数据和参数设置中,我们的调查显示,最高性能模型远离超度空间的竞争者(NRE-A,NRE-B),我们建议采用不同于以往模型的超度参数。我们建议将相互信息作为基于模拟的推论方法的性能衡量标准,而无需实验性测样和实验结果。