Motivated by the mode estimation problem of an unknown multivariate probability density function, we study the problem of identifying the point with the minimum k-th nearest neighbor distance for a given dataset of n points. We study the case where the pairwise distances are apriori unknown, but we have access to an oracle which we can query to get noisy information about the distance between any pair of points. For two natural oracle models, we design a sequential learning algorithm, based on the idea of confidence intervals, which adaptively decides which queries to send to the oracle and is able to correctly solve the problem with high probability. We derive instance-dependent upper bounds on the query complexity of our proposed scheme and also demonstrate significant improvement over the performance of other baselines via extensive numerical evaluations.
翻译:基于未知多变量概率密度函数的模式估计问题,我们研究如何用最小 k-th 最近的近邻距离来识别给定的 n 数据集的点。我们研究了对称距离优先未知的情况,但我们可以查询一个神器,以获得关于任何一对点之间距离的噪音信息。对于两个自然神器模型,我们根据信任间隔的理念设计一个顺序学习算法,该算法将适应性地决定向神器发送的查询,并能够以很高的概率正确解决问题。我们从我们提议的方案查询复杂性中得出基于实例的上限,并通过广泛的数字评估来显示与其他基线的性能有显著改善。