利用Soft Brownian 偏移取样器和自动编码器进行分配外探测和生成 (Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders)

Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection. For this purpose, we propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset. This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand. The samples in this dataset are with respect to the feature space close to the in-distribution dataset and therefore realistic and plausible. Hence, this dataset can also be used to safeguard neural networks, i.e., to validate the generalization performance. Our approach first generates suitable representations of an in-distribution dataset using an autoencoder and then transforms them using our novel proposed Soft Brownian Offset method. After transformation, the decoder part of the autoencoder allows for the generation of these implicit out-of-distribution samples. This newly generated dataset then allows for mixing with other datasets and thus improved training of an out-of-distribution classifier, increasing its performance. Experimentally, we show that our approach is promising for time series using synthetic data. Using our new method, we also show in a quantitative case study that we can improve the out-of-distribution detection for the MNIST dataset. Finally, we provide another case study on the synthetic generation of out-of-distribution trajectories, which can be used to validate trajectory prediction algorithms for automated driving.

翻译：深心神经网络往往充满了过度自信,可以通过改进分配外检测来部分补救。为此,我们提出一种新的方法, 允许根据给定的分配数据集生成分配外数据集。然后, 新的数据集可用于改进特定数据集和机器学习任务的分配外检测。本数据集中的样本与分布内数据集附近的特征空间有关, 因而是现实和可信的。因此, 这个数据集也可以用来保护神经网络, 即验证一般化性能。我们的方法首先使用自动编码器生成适当的分配内数据集, 然后使用我们的新颖提议的Soft Brownian Offset 方法转换这些数据。转换后, 自动编码的解码部分允许生成这些隐含的分布外抽样。这个新生成的数据集可以与其他数据集进行混合, 从而改进对分配外分类的合成数据的培训, 增加其性能。实验性地说, 我们使用一个配置内分配内数据集的配置内部数据, 也显示我们使用一个有希望的合成数据序列。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【图神经网络导论】Intro to Graph Neural Networks，176页ppt

专知会员服务

127+阅读 · 2021年6月4日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日