A reliable evaluation method is essential for building a robust out-of-distribution (OOD) detector. Current robustness evaluation protocols for OOD detectors rely on injecting perturbations to outlier data. However, the perturbations are unlikely to occur naturally or not relevant to the content of data, providing a limited assessment of robustness. In this paper, we propose Evaluation-via-Generation for OOD detectors (EvG), a new protocol for investigating the robustness of OOD detectors under more realistic modes of variation in outliers. EvG utilizes a generative model to synthesize plausible outliers, and employs MCMC sampling to find outliers misclassified as in-distribution with the highest confidence by a detector. We perform a comprehensive benchmark comparison of the performance of state-of-the-art OOD detectors using EvG, uncovering previously overlooked weaknesses.
翻译:一种可靠的评价方法对于建立一个强大的分流探测器(OOD)至关重要。OOD探测器目前的稳健性评价程序依靠注射扰动来获取外部数据。然而,扰动不大可能自然发生,或与数据内容无关,只能对稳健性作有限的评估。在本文件中,我们建议对OOD探测器的评估-通过评价-口中检测仪(EvG)进行新的程序,以更现实的外部差异模式来调查OOD探测器的稳健性。EvG利用一种基因化模型来合成可信的外源,并使用MCMC取样来发现被检测者以最高信任度在分发时被错误分类的外源。我们用EvG对最新的OOD探测器的性能进行了全面的基准比较,发现以前被忽视的弱点。