Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning. There is vast literature on efficient algorithms for constructing data structures and performing exact and approximate NNS. This paper studies NNS under Uncertainty (NNSU). Specifically, consider the setting in which an NNS algorithm has access only to a stochastic distance oracle that provides a noisy, unbiased estimate of the distance between any pair of points, rather than the exact distance. This models many situations of practical importance, including NNS based on human similarity judgements, physical measurements, or fast, randomized approximations to exact distances. A naive approach to NNSU could employ any standard NNS algorithm and repeatedly query and average results from the stochastic oracle (to reduce noise) whenever it needs a pairwise distance. The problem is that a sufficient number of repeated queries is unknown in advance; e.g., a point maybe distant from all but one other point (crude distance estimates suffice) or it may be close to a large number of other points (accurate estimates are necessary). This paper shows how ideas from cover trees and multi-armed bandits can be leveraged to develop an NNSU algorithm that has optimal dependence on the dataset size and the (unknown)geometry of the dataset.
翻译:近邻搜索(NNS)是知识代表、学习和推理方面的一项核心任务。关于构建数据结构和进行精确和近似NNS的有效算法的大量文献,本文在不确定性(NNSS)下研究NNS。具体地说,考虑NNS算法仅能进入一个随机距离或触角,提供对任何一对点之间的距离的不声不响、不偏颇的估计,而不是准确距离。这种模型有许多实际重要的情况,包括基于人类相似性判断、物理测量或快速随机近似到精确距离的NNS。对NSSU的天真的方法可以使用任何标准的NNS算法,并在需要双向距离时反复查询和从SOCL(减少噪音)得出平均结果。问题在于,是否事先无法了解足够的重复查询次数;例如,可能离所有点很远的一个点(隐蔽的距离估计已经足够了),或者可能接近大量其他点(精确估计是必要的)。本文显示,从覆盖树木和多条距离的数据和多条距离的顶端数据中,可以从覆盖的树和多层数据中得出一个最可靠的数据。