Dimensionality reduction is a crucial first step for many unsupervised learning tasks including anomaly detection and clustering. Autoencoder is a popular mechanism to accomplish dimensionality reduction. In order to make dimensionality reduction effective for high-dimensional data embedding nonlinear low-dimensional manifold, it is understood that some sort of geodesic distance metric should be used to discriminate the data samples. Inspired by the success of geodesic distance approximators such as ISOMAP, we propose to use a minimum spanning tree (MST), a graph-based algorithm, to approximate the local neighborhood structure and generate structure-preserving distances among data points. We use this MST-based distance metric to replace the Euclidean distance metric in the embedding function of autoencoders and develop a new graph regularized autoencoder, which outperforms a wide range of alternative methods over 20 benchmark anomaly detection datasets. We further incorporate the MST regularizer into two generative adversarial networks and find that using the MST regularizer improves the performance of anomaly detection substantially for both generative adversarial networks. We also test our MST regularized autoencoder on two datasets in a clustering application and witness its superior performance as well.
翻译:对于许多不受监督的学习任务,包括异常探测和集群,减少地积分是关键的第一步。自动编码器是完成维度减少的流行机制。为了让高维数据在非线性低维多元体中嵌入的高维数据有效降低维度,可以理解,应当使用某种大地测量距离测量测量度量来区分数据样本。由于ISOMAP等大地测量距离近距离近距离探测仪的成功,我们提议使用一种基于图形的算法,即最小宽树(MST),以近似本地邻居结构,并产生数据点之间的结构保持距离。我们使用这种基于MST的距离度量度来取代自动编码器嵌入的非线性低维数据,并开发一种新的图形化的标准化自动电解分解仪,这比20个基准异常探测数据集等多种替代方法的成功程度要强。我们进一步将MST正规化成两种基于图的对称对立算法网络,并发现使用MST的成像器可改善对基因对抗网络的反常态探测性功能。我们还测试了我们的MST定期的高级数据,作为两组的高级数据。