In metric search, worst-case analysis is of little value, as the search invariably degenerates to a linear scan for ill-behaved data. Consequently, much effort has been expended on more nuanced descriptions of what performance might in fact be attainable, including heuristic baselines like the AESA family, as well as statistical proxies such as intrinsic dimensionality. This paper gets to the heart of the matter with an exact characterization of the best performance actually achievable for any given data set and query. Specifically, linear-time objective-preserving reductions are established in both directions between optimal metric search and the minimum dominating set problem, whose greedy approximation becomes the equivalent of an oracle-based AESA, repeatedly selecting the pivot that eliminates the most of the remaining points. As an illustration, the AESA heuristic is adapted to downplay the role of previously eliminated points, yielding some modest performance improvements over the original, as well as its younger relative iAESA2.
翻译:在量度搜索中,最坏的个案分析没有多大价值,因为搜索总是逐渐变成对不守规数据进行线性扫描,因此,在更细微地描述实际可能实现的绩效方面已经付出了很大努力,包括像AESA家族那样的重力基线,以及诸如内在维度等统计代理物。本文触及问题的核心,确切地描述任何数据集和查询实际能够达到的最佳性能。具体地说,线性时间目标保留削减在最佳度量度搜索和最小支配性设定问题(其贪婪近似相当于甲骨文的AESA)之间的两个方向上,其贪婪近似相当于甲骨文的AESA,反复选择消除大部分剩余点的枢纽。举例来说,AESA超度被调整为低估了以前被删除点的作用,比原有点及其较年轻的相对的iAESA2。