In this paper, we propose POTATOES (Partitioning OverfiTting AuTOencoder EnSemble), a new method for unsupervised outlier detection (UOD). More precisely, given any autoencoder for UOD, this technique can be used to improve its accuracy while at the same time removing the burden of tuning its regularization. The idea is to not regularize at all, but to rather randomly partition the data into sufficiently many equally sized parts, overfit each part with its own autoencoder, and to use the maximum over all autoencoder reconstruction errors as the anomaly score. We apply our model to various realistic datasets and show that if the set of inliers is dense enough, our method indeed improves the UOD performance of a given autoencoder significantly. For reproducibility, the code is made available on github so the reader can recreate the results in this paper as well as apply the method to other autoencoders and datasets.
翻译:在本文中,我们建议使用“POTATOES ”, 这是一种不受监督外出检测的新方法(UOD ) 。 更确切地说, 如果有UOD的自动编码器, 这种方法可以用来提高它的准确性, 同时消除调整其规范化的负担。 想法是完全不规范数据, 而是随机地将数据分成足够多的同等大小的部件, 将每个部件都配上自己的自动编码器, 并使用所有自动编码器重建错误的最大值作为异常分。 我们将我们的模型应用到各种现实数据集中, 并显示如果离子集密度足够大, 我们的方法确实可以显著地提高给定的自动编码的 UOD性能。 为了复制, 代码可以在 github 上提供, 以便读者可以重新生成此文件中的结果, 并将该方法应用到其他自动编码和数据集中 。