It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
翻译:在部署机器学习系统时,必须发现异常输入;在深层学习中使用较大和更为复杂的输入放大了区分异常和分布中实例的困难;同时,可以提供大量不同的图像和文本数据;我们提议利用这些数据,通过培训异常探测器,对外部外部暴露的辅助数据集(我们称之为外部暴露(OE))进行深层异常探测;这使得异常探测器能够概括和探测不可见的异常;在对自然语言处理和小型及大型视觉任务的广泛实验中,我们发现外部暴露大大改善了探测性能;我们还注意到,在CIFAR-10培训的尖端基因化模型可能给SVHN图像带来比CIFAR-10图像更大的可能性;我们利用OE来缓解这一问题;我们还分析了外部暴露的灵活性和稳健性,并查明辅助数据集的特征,从而改进了性能。