对在敏感散沙中寻找地点敏感散沙中搜索半径的机器学习技术的实验分析 (Experimental Analysis of Machine Learning Techniques for Finding Search Radius in Locality Sensitive Hashing)

Finding similar data in high-dimensional spaces is one of the important tasks in multimedia applications. Approaches introduced to find exact searching techniques often use tree-based index structures which are known to suffer from the curse of the dimensionality problem that limits their performance. Approximate searching techniques prefer performance over accuracy and they return good enough results while achieving a better performance. Locality Sensitive Hashing (LSH) is one of the most popular approximate nearest neighbor search techniques for high-dimensional spaces. One of the most time-consuming processes in LSH is to find the neighboring points in the projected spaces. An improved LSH-based index structure, called radius-optimized Locality Sensitive Hashing (roLSH) has been proposed to utilize Machine Learning and efficiently find these neighboring points; thus, further improve the overall performance of LSH. In this paper, we extend roLSH by experimentally studying the effect of different types of famous Machine Learning techniques on overall performance. We compare ten regression techniques on four real-world datasets and show that Neural Network-based techniques are the best fit to be used in roLSH as their accuracy and performance trade-off are the best compared to the other techniques.

翻译：在高维空间寻找类似数据是多媒体应用的重要任务之一。为寻找精确搜索技术而采用的方法通常使用已知因限制其性能的维度问题诅咒而蒙受痛苦的基于树基的索引结构。近似搜索技术更偏好性能而不是准确性,在取得更好的性能的同时它们返回了足够好的结果。地方敏感散射(LSH)是高维空间最受欢迎的近邻搜索技术之一。LSH最耗时的过程之一是在预测的空间中找到邻近点。一个基于LSH的改进的指数结构,称为半径优化本地敏感散射(ROLSH)已经建议利用机器学习,并有效地找到这些近端点;因此,进一步提高LSH的总体性能。在本文中,我们通过实验研究不同类型著名的机器学习技术对总体性能的影响来扩展ROLSH。我们比较了四个真实世界数据集的十种回归技术,并表明以神经网络为基础的技术是最适合在ROSH中使用的,因为它们的准确性和性能贸易技术与其他技术相比是最佳的。