Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the \emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some -- but not all -- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic "hardness" of standard benchmark datasets, and classes within those datasets.
翻译:评估特定数据驱动分类问题的内在困难对于建立绝对基准和评估实地进展非常重要。 为此, 需要考虑的自然数量是 \ emph{ Bayes 错误}, 用来测量在理论上为特定数据分布所能实现的最佳分类错误。 虽然通常是一个棘手的数量, 我们显示我们可以计算使用正常流学方法所学的基因化模型的精确贝斯错误。 我们的技术依赖于一个基本结果, 这表明贝斯错误在不可逆的变换中是不可改变的。 因此, 我们可以通过计算高山基分布来计算所学流模型的准确贝斯错误, 并且使用福尔摩斯- Diaconis- Ros 集成来高效地完成。 此外, 我们显示, 通过改变所学流模型的温度, 我们可以生成与标准基准数据集非常相似的合成数据集, 但是几乎存在任何想要的错误。 我们用我们的方法对最新分类模型进行彻底的调查, 并且发现, 在某些 -- 但不是所有案例中, 这些模型都能够获得非常精确的精确性, 我们用我们的方法来评估那些最精确的标准。