Parametric and non-parametric classifiers often have to deal with real-world data, where corruptions like noise, occlusions, and blur are unavoidable - posing significant challenges. We present a probabilistic approach to classify strongly corrupted data and quantify uncertainty, despite the model only having been trained with uncorrupted data. A semi-supervised autoencoder trained on uncorrupted data is the underlying architecture. We use the decoding part as a generative model for realistic data and extend it by convolutions, masking, and additive Gaussian noise to describe imperfections. This constitutes a statistical inference task in terms of the optimal latent space activations of the underlying uncorrupted datum. We solve this problem approximately with Metric Gaussian Variational Inference (MGVI). The supervision of the autoencoder's latent space allows us to classify corrupted data directly under uncertainty with the statistically inferred latent space activations. Furthermore, we demonstrate that the model uncertainty strongly depends on whether the classification is correct or wrong, setting a basis for a statistical "lie detector" of the classification. Independent of that, we show that the generative model can optimally restore the uncorrupted datum by decoding the inferred latent space activations.
翻译:参数和非参数分类往往必须处理真实世界数据,其中诸如噪音、封闭性和模糊性等腐败是不可避免的,因而构成重大挑战。我们提出了一个对严重腐蚀的数据进行分类和量化不确定性的概率化方法,尽管模型只受过无干扰数据的培训。一个半监督的、受过无干扰数据训练的自动编码器是基础结构。我们使用解码部分作为现实数据的基因化模型,并通过变异、遮掩和添加剂高斯噪音来扩展数据,以描述不完善之处。这构成了一个统计推论任务,即对基础未腐蚀的塔木进行最佳的潜在空间启动。我们大约用Metric Gausian Variationalference(MGVI)解决了这个问题。对自动编码器潜在空间的监督使我们能够在不确定性下直接对腐败数据进行分类,通过统计推断的潜伏空间激活。此外,我们证明模型不确定性在很大程度上取决于分类是否正确或错误,为“最优化的层层层”的统计“模拟”的恢复,我们能够独立地显示“最优化的“最深层”的“恢复”的“最深层”分类的基础。