Deep generative models have been demonstrated as state-of-the-art density estimators. Yet, recent work has found that they often assign a higher likelihood to data from outside the training distribution. This seemingly paradoxical behavior has caused concerns over the quality of the attained density estimates. In the context of hierarchical variational autoencoders, we provide evidence to explain this behavior by out-of-distribution data having in-distribution low-level features. We argue that this is both expected and desirable behavior. With this insight in hand, we develop a fast, scalable and fully unsupervised likelihood-ratio score for OOD detection that requires data to be in-distribution across all feature-levels. We benchmark the method on a vast set of data and model combinations and achieve state-of-the-art results on out-of-distribution detection.
翻译:深度基因模型被证明为最先进的密度估计模型。然而,最近的工作发现,它们往往更可能从培训分布之外获得数据。这种看似自相矛盾的行为引起了对所达到密度估计质量的担忧。在等级差异性自动编码器方面,我们提供了证据来解释以分布中具有低度特征的超出分配数据来解释这种行为。我们争辩说,这是预期和可取的行为。我们掌握了这种洞察力,我们为OOD探测开发了快速、可扩展和完全不受监督的概率比值,这需要在所有地貌层次上进行数据分配。我们将这种方法以大量数据和模型组合为基础,并在分配外检测中实现最新的结果。