We consider the problem of embedding a subset of $\mathbb{R}^n$ into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability: we first apply a specific structured random matrix, which we call the double circulant matrix; using that matrix requires linear storage and matrix-vector multiplication can be performed in near-linear time. We then binarize each vector by comparing each of its entries to a random threshold, selected uniformly at random from a well-chosen interval. We estimate the number of bits required for this encoding scheme in terms of two natural geometric complexity parameters of the set - its Euclidean covering numbers and its localized Gaussian complexity. The estimate we derive turns out to be the best that one can hope for - up to logarithmic terms. The key to the proof is a phenomenon of independent interest: we show that the double circulant matrix mimics the behavior of a Gaussian matrix in two important ways. First, it maps an arbitrary set in $\mathbb{R}^n$ into a set of well-spread vectors. Second, it yields a fast near-isometric embedding of any finite subset of $\ell_2^n$ into $\ell_1^m$. This embedding achieves the same dimension reduction as a Gaussian matrix in near-linear time, under an optimal condition - up to logarithmic factors - on the number of points to be embedded. This improves a well-known construction due to Ailon and Chazelle.
翻译:我们考虑将一个子集的 $mathb{R ⁇ {R ⁇ n$ 嵌入一个低维Hamming 立方体的问题。 我们建造了一个简单、 数据透视和计算高效的地图, 以高概率完成这项任务: 我们首先应用一个特定的结构性随机矩阵, 我们称之为双螺旋矩阵; 使用该矩阵需要线性存储和矩阵- 矢量倍增可以在近线时间进行。 然后我们通过将每个矢量的条目与随机阈值进行比较, 以随机的方式从精密间隔中随机选择。 我们估计这个编码方案需要的比特数, 以该设置的两种自然几何复杂参数来完成这一任务 : 我们首先应用一个结构结构结构, 覆盖数字及其局部的高山复杂度。 我们的估算结果显示, 最理想的- 直径存储器- 以两个重要的方式将高斯基的基质基质基数 的基数 。 我们显示, 将双基质基质基质基质基质矩阵将高斯基质基质基数的比 以近两个重要的方式将它的行为进行。