The celebrated Johnson-Lindenstrauss lemma states that for all $\varepsilon \in (0,1)$ and finite sets $X \subseteq \mathbb{R}^N$ with $n>1$ elements, there exists a matrix $\Phi \in \mathbb{R}^{m \times N}$ with $m=\mathcal{O}(\varepsilon^{-2}\log n)$ such that \[ (1 - \varepsilon) \|x-y\|_2 \leq \|\Phi x-\Phi y\|_2 \leq (1+\varepsilon)\| x- y\|_2 \quad \forall\, x, y \in X.\] Herein we consider terminal embedding results which have recently been introduced in the computer science literature as stronger extensions of the Johnson-Lindenstrauss lemma for finite sets. After a short survey of this relatively recent line of work, we extend the theory of terminal embeddings to hold for arbitrary (e.g., infinite) subsets $X \subseteq \mathbb{R}^N$, and then specialize our generalized results to the case where $X$ is a low-dimensional compact submanifold of $\mathbb{R}^N$. In particular, we prove the following generalization of the Johnson-Lindenstrauss lemma: For all $\varepsilon \in (0,1)$ and $X\subseteq\mathbb{R}^N$, there exists a terminal embedding $f: \mathbb{R}^N \longrightarrow \mathbb{R}^{m}$ such that $$(1 - \varepsilon) \| x - y \|_2 \leq \left\| f(x) - f(y) \right\|_2 \leq (1 + \varepsilon) \| x - y \|_2 \quad \forall \, x \in X ~{\rm and}~ \forall \, y \in \mathbb{R}^N.$$ Crucially, we show that the dimension $m$ of the range of $f$ above is optimal up to multiplicative constants, satisfying $m=\mathcal{O}(\varepsilon^{-2} \omega^2(S_X))$, where $\omega(S_X)$ is the Gaussian width of the set of unit secants of $X$, $S_X=\overline{\{(x-y)/\|x-y\|_2 \colon x \neq y \in X\}}$. Furthermore, our proofs are constructive and yield algorithms for computing a general class of terminal embeddings $f$, an instance of which is demonstrated herein to allow for more accurate compressive nearest neighbor classification than standard linear Johnson-Lindenstrauss embeddings do in practice.
翻译:暂无翻译