使用半监督自动编码器对失密数据进行半监督自动编码器 (Classification and Uncertainty Quantification of Corrupted Data using Semi-Supervised Autoencoders)

Parametric and non-parametric classifiers often have to deal with real-world data, where corruptions like noise, occlusions, and blur are unavoidable - posing significant challenges. We present a probabilistic approach to classify strongly corrupted data and quantify uncertainty, despite the model only having been trained with uncorrupted data. A semi-supervised autoencoder trained on uncorrupted data is the underlying architecture. We use the decoding part as a generative model for realistic data and extend it by convolutions, masking, and additive Gaussian noise to describe imperfections. This constitutes a statistical inference task in terms of the optimal latent space activations of the underlying uncorrupted datum. We solve this problem approximately with Metric Gaussian Variational Inference (MGVI). The supervision of the autoencoder's latent space allows us to classify corrupted data directly under uncertainty with the statistically inferred latent space activations. Furthermore, we demonstrate that the model uncertainty strongly depends on whether the classification is correct or wrong, setting a basis for a statistical "lie detector" of the classification. Independent of that, we show that the generative model can optimally restore the uncorrupted datum by decoding the inferred latent space activations.

翻译：参数和非参数分类往往必须处理真实世界数据,其中诸如噪音、封闭性和模糊性等腐败是不可避免的,因而构成重大挑战。我们提出了一个对严重腐蚀的数据进行分类和量化不确定性的概率化方法,尽管模型只受过无干扰数据的培训。一个半监督的、受过无干扰数据训练的自动编码器是基础结构。我们使用解码部分作为现实数据的基因化模型,并通过变异、遮掩和添加剂高斯噪音来扩展数据,以描述不完善之处。这构成了一个统计推论任务,即对基础未腐蚀的塔木进行最佳的潜在空间启动。我们大约用Metric Gausian Variationalference(MGVI)解决了这个问题。对自动编码器潜在空间的监督使我们能够在不确定性下直接对腐败数据进行分类,通过统计推断的潜伏空间激活。此外,我们证明模型不确定性在很大程度上取决于分类是否正确或错误,为“最优化的层层层”的统计“模拟”的恢复,我们能够独立地显示“最优化的“最深层”的“恢复”的“最深层”分类的基础。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日