Detection of out-of-distribution samples is one of the critical tasks for real-world applications of computer vision. The advancement of deep learning has enabled us to analyze real-world data which contain unexplained samples, accentuating the need to detect out-of-distribution instances more than before. GAN-based approaches have been widely used to address this problem due to their ability to perform distribution fitting; however, they are accompanied by training instability and mode collapse. We propose a simple yet efficient reconstruction-based method that avoids adding complexities to compensate for the limitations of GAN models while outperforming them. Unlike previous reconstruction-based works that only utilize reconstruction error or generated samples, our proposed method simultaneously incorporates both of them in the detection task. Our model, which we call "Connective Novelty Detection" has two subnetworks, an autoencoder, and a binary classifier. The autoencoder learns the representation of the positive class by reconstructing them. Then, the model creates negative and connected positive examples using real and generated samples. Negative instances are generated via manipulating the real data, so their distribution is close to the positive class to achieve a more accurate boundary for the classifier. To boost the robustness of the detection to reconstruction error, connected positive samples are created by combining the real and generated samples. Finally, the binary classifier is trained using connected positive and negative examples. We demonstrate a considerable improvement in novelty detection over state-of-the-art methods on MNIST and Caltech-256 datasets.
翻译:对分配外抽样的检测是现实世界应用计算机视野的关键任务之一。 深层次学习的进步使我们能够分析真实世界数据,其中含有无法解释的样本,这使我们能够分析真实世界数据,从而比以往更需要发现分配外的事例。 基于GAN的方法被广泛用于解决这一问题,因为它们有能力进行分配;然而,它们伴随着培训不稳定和模式崩溃。我们提出了一个简单而高效的重建方法,避免增加复杂性,以弥补GAN模型的局限性,同时超过这些模型。与以前仅利用重建错误或生成样本的重建工作不同,我们拟议的方法同时将这两个数据纳入探测任务。我们称之为“同步Novellty探测”的模型有两个子网络,一个自动编码器和一个二进化分类器。自动编码器通过重建它们来了解正面的等级。然后,该模型用真实和生成的样本产生负面和相互关联的实例,通过对真实数据进行调控,因此其分布接近于正面的类别,从而实现精确的升级,最终将精确的标本级的标本化。