In this work, we improve upon the guarantees for sparse random embeddings, as they were recently provided and analyzed by Freksen at al. (NIPS'18) and Jagadeesan (NIPS'19). Specifically, we show that (a) our bounds are explicit as opposed to the asymptotic guarantees provided previously, and (b) our bounds are guaranteed to be sharper by practically significant constants across a wide range of parameters, including the dimensionality, sparsity and dispersion of the data. Moreover, we empirically demonstrate that our bounds significantly outperform prior works on a wide range of real-world datasets, such as collections of images, text documents represented as bags-of-words, and text sequences vectorized by neural embeddings. Behind our numerical improvements are techniques of broader interest, which improve upon key steps of previous analyses in terms of (c) tighter estimates for certain types of quadratic chaos, (d) establishing extreme properties of sparse linear forms, and (e) improvements on bounds for the estimation of sums of independent random variables.
翻译:在这项工作中,我们改进了对稀散随机嵌入物的保障,因为Freksen at al.(NIPS'18)和 Jagadeesan(NIPS'19)最近提供了这些保证并加以分析。具体地说,我们表明:(a) 我们的界限是明确的,而不是以前提供的无线保证,和(b) 我们的界限有保证更加清晰,通过一系列参数的几乎重要的常数得到保证,这些参数范围很广,包括数据的维度、宽度和分散。此外,我们从经验上表明,我们的界限大大超过以前在一系列真实世界数据集方面开展的工作,例如图像的收集、文字文件作为字包和文字序列以内嵌方式矢量。我们的数字改进背后是具有广泛兴趣的技术,这些技术改进了以前在以下方面分析的关键步骤:(c) 对某些四面混杂物种类的更严格估计,(d) 建立微线形的极端特性,以及(e) 改进了独立随机变量数量的范围。