In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $\|\cdot\|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell_{p}$ , this runtime matches the aforementioned state-of-art oracles.
翻译:在远程 Oracle 问题中, 目标是在一个廉价的数据结构中预处理 $x_ 1, x_ 2,\ cdots, x_n美元, 以美元维度度空间 $ (mathbb{X ⁇ d,\\ cdot\ ⁇ l) 美元, 因此如果是一个查询矢量 $q / in\ mathb{X ⁇ d$, 并有一个子值 $S\ subseq [n] 输入数据点, 所有的离量 $x% q - x_ i l$, 美元x_ i\ 美元, x_ lid, x_ d_ 美元, modral_ comm ral_ rotherral ral_ rudeal_ ral_ rmal_ $美元 。 问题被很好地理解, 我们的主要贡献是快速的 $( 1\ d\ d\ d\ darfsl) 美元 美元, lex_ drodudeal_ dal_ drodude, exal_ sal_ card=x_ a.