We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches.
翻译:我们提出了一个基于愿景变换器(VT)的新颖的深层次学习框架,用于一等分类,核心思想是使用零偏向高斯噪音作为潜在空间代表的假阴性阶级,然后利用最佳损失功能对网络进行培训。在先前的工程中,我们付出了巨大的努力来学习使用各种损失功能的良好代表性,这些功能既能确保歧视性,又能确保紧凑性。拟议的一等愿景变换器(OCFormer)正在对CIFAR-10、CIFAR-100、时装-MNIST和CelebA眼镜数据集进行详尽的实验。我们的方法在有线电视新闻网的单级分类方法上取得了显著进步。