Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and $W$) jets in a QCD background. We find that the hyperparameter choices strongly affect the network performance and that the optimal parameters for one signal are non-optimal for another. In exploring the networks, we uncover a connection between the latent space of a variational autoencoder trained using mean-squared-error and the optimal transport distances within the dataset. We then show that optimal transport distances to representative events in the background dataset can be used directly for anomaly detection, with performance comparable to the autoencoders. Whether using autoencoders or optimal transport distances for anomaly detection, we find that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches.
翻译:异常检测依赖于设计一个分数来确定特定事件是否不具有特定背景分布的特性。 确定分数的一个方法就是使用自动代算器,因为自动代算器依赖于重建某些类型的数据的能力( 背面), 而不是其他( 信号 ) 。 在本文中, 我们研究了与变异自动代算器相关的一些挑战, 例如对超光度计的依赖和所使用的测量方法, 在QCD背景中的反常信号( 顶部和 $W$) 喷气式喷气式喷气式喷气式喷气机。 我们发现, 超光度计选择对网络的性能产生了强烈影响, 并且一个信号的最佳参数对另一个信号来说并不最理想。 在探索网络时, 我们发现了使用暗面自动代算器训练的变异性自动代算器潜在空间与数据集内的最佳运输距离。 然后我们发现, 与背景数据集中具有代表性的事件的最佳运输距离可以直接用于异常的检测, 与自动代算器相似的性能。 无论是使用其他自动代算器还是最优的运输距离来检测反常态, 我们发现异常的半探测方法必然代表不甚的超度的背景。