Unsupervised learning-based anomaly detection in latent space has gained importance since discriminating anomalies from normal data becomes difficult in high-dimensional space. Both density estimation and distance-based methods to detect anomalies in latent space have been explored in the past. These methods prove that retaining valuable properties of input data in latent space helps in the better reconstruction of test data. Moreover, real-world sensor data is skewed and non-Gaussian in nature, making mean-based estimators unreliable for skewed data. Again, anomaly detection methods based on reconstruction error rely on Euclidean distance, which does not consider useful correlation information in the feature space and also fails to accurately reconstruct the data when it deviates from the training distribution. In this work, we address the limitations of reconstruction error-based autoencoders and propose a kernelized autoencoder that leverages a robust form of Mahalanobis distance (MD) to measure latent dimension correlation to effectively detect both near and far anomalies. This hybrid loss is aided by the principle of maximizing the mutual information gain between the latent dimension and the high-dimensional prior data space by maximizing the entropy of the latent space while preserving useful correlation information of the original data in the low-dimensional latent space. The multi-objective function has two goals -- it measures correlation information in the latent feature space in the form of robust MD distance and simultaneously tries to preserve useful correlation information from the original data space in the latent space by maximizing mutual information between the prior and latent space.
翻译:由于在高空空间很难发现正常数据的偏差,因此在潜潜空中不经监督的基于学习的异常现象探测变得日益重要。过去曾探索过密度估计和基于距离的探测潜潜空异常现象的方法。这些方法证明,在潜潜空中保留输入数据的宝贵特性有助于更好地重建测试数据。此外,真实世界传感器数据是扭曲的,非高加索的,使基于暗基的估测器对扭曲数据来说都不可靠。同样,基于重建错误的偏差检测方法依赖于Euclidean距离,该距离并不考虑地貌空间的有用相关信息,而且在数据偏离培训分布时也无法准确重建数据。在这项工作中,我们解决了基于错误的自动转换器在潜值数据方面的局限性,并提出了一种内嵌式自动解析器,利用一种强有力的马哈拉诺比距离(MD)测量潜在维度相关性,以有效探测近和远处的偏差数据。这种混合损失的助力在于最大限度地增加空间潜在层面与高维空空间前位数据之间的相联,同时通过维护空间前空层前空层数据与前空空层数据之间原始数据正比的深数据之间的相互关系,同时最大限度地保持空间数据,使空间数据在空间稳定前空间的深层数据上层数据与前空间的深层数据与前空间的深层数据在空间的深层状态上保持了空间的深层能能能。