As data volumes continue to grow, clustering and outlier detection algorithms are becoming increasingly time-consuming. Classical index structures for neighbor search are no longer sustainable due to the "curse of dimensionality". Instead, approximated index structures offer a good opportunity to significantly accelerate the neighbor search for clustering and outlier detection and to have the lowest possible error rate in the results of the algorithms. Locality-sensitive hashing is one of those. We indicate directions to model the properties of LSH.
翻译:随着数据量的继续增长,集群和外星探测算法越来越耗时。 用于邻居搜索的古典索引结构由于“维度诅咒”而不再可持续。 相反,近似索引结构提供了一个很好的机会,可以大大加快邻居对集群和外星探测的搜索,并在算法结果中达到尽可能最低的误差率。对本地性敏感的散列就是其中之一。我们指出了模拟 LSH 特性的方向。