Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection. For this purpose, we propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset. This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand. The samples in this dataset are with respect to the feature space close to the in-distribution dataset and therefore realistic and plausible. Hence, this dataset can also be used to safeguard neural networks, i.e., to validate the generalization performance. Our approach first generates suitable representations of an in-distribution dataset using an autoencoder and then transforms them using our novel proposed Soft Brownian Offset method. After transformation, the decoder part of the autoencoder allows for the generation of these implicit out-of-distribution samples. This newly generated dataset then allows for mixing with other datasets and thus improved training of an out-of-distribution classifier, increasing its performance. Experimentally, we show that our approach is promising for time series using synthetic data. Using our new method, we also show in a quantitative case study that we can improve the out-of-distribution detection for the MNIST dataset. Finally, we provide another case study on the synthetic generation of out-of-distribution trajectories, which can be used to validate trajectory prediction algorithms for automated driving.
翻译:深心神经网络往往充满了过度自信,可以通过改进分配外检测来部分补救。 为此,我们提出一种新的方法, 允许根据给定的分配数据集生成分配外数据集。 然后, 新的数据集可用于改进特定数据集和机器学习任务的分配外检测。 本数据集中的样本与分布内数据集附近的特征空间有关, 因而是现实和可信的。 因此, 这个数据集也可以用来保护神经网络, 即验证一般化性能。 我们的方法首先使用自动编码器生成适当的分配内数据集, 然后使用我们的新颖提议的Soft Brownian Offset 方法转换这些数据。 转换后, 自动编码的解码部分允许生成这些隐含的分布外抽样。 这个新生成的数据集可以与其他数据集进行混合, 从而改进对分配外分类的合成数据的培训, 增加其性能。 实验性地说, 我们使用一个配置内分配内数据集的配置内部数据, 也显示我们使用一个有希望的合成数据序列 。