The Johnson-Lindenstrauss (JL) lemma is a cornerstone of dimensionality reduction in Euclidean space, but its applicability to non-Euclidean data has remained limited. This paper extends the JL lemma beyond Euclidean geometry to handle general dissimilarity matrices that are prevalent in real-world applications. We present two complementary approaches: First, we show the JL transform can be applied to vectors in pseudo-Euclidean space with signature $(p,q)$, providing theoretical guarantees that depend on the ratio of the $(p, q)$ norm and Euclidean norm of two vectors, measuring the deviation from Euclidean geometry. Second, we prove that any symmetric hollow dissimilarity matrix can be represented as a matrix of generalized power distances, with an additional parameter representing the uncertainty level within the data. In this representation, applying the JL transform yields multiplicative approximation with a controlled additive error term proportional to the deviation from Euclidean geometry. Our theoretical results provide fine-grained performance analysis based on the degree to which the input data deviates from Euclidean geometry, making practical and meaningful reduction in dimensionality accessible to a wider class of data. We validate our approaches on both synthetic and real-world datasets, demonstrating the effectiveness of extending the JL lemma to non-Euclidean settings.
翻译:Johnson-Lindenstrauss (JL) 引理是欧几里得空间中降维的基石,但其对非欧几里得数据的适用性一直有限。本文将 JL 引理推广到欧几里得几何之外,以处理现实应用中普遍存在的一般性相异矩阵。我们提出了两种互补的方法:首先,我们证明 JL 变换可以应用于具有 $(p,q)$ 符号的伪欧几里得空间中的向量,其理论保证取决于两个向量的 $(p, q)$ 范数与欧几里得范数之比,该比值度量了与欧几里得几何的偏离程度。其次,我们证明任何对称的空心相异矩阵都可以表示为一个广义幂距离矩阵,其中包含一个代表数据内部不确定性水平的附加参数。在此表示下,应用 JL 变换可产生具有乘法近似性的结果,其受控的加法误差项与偏离欧几里得几何的程度成正比。我们的理论结果基于输入数据偏离欧几里得几何的程度,提供了细粒度的性能分析,从而使得更广泛类别的数据能够实现实用且有意义的降维。我们在合成数据集和真实数据集上验证了我们的方法,证明了将 JL 引理扩展到非欧几里得设置的有效性。