We give an input sparsity time sampling algorithm for spectrally approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors. Furthermore, for the important special care of the $q$-fold self-tensoring of a dataset, which is the feature matrix of the degree-$q$ polynomial kernel, the leading term of our method's runtime is proportional to the size of the dataset and has no dependence on $q$. Previous techniques either incur a poly$(q)$ factor slowdown in their runtime or remove the dependence on $q$ at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time that only depends on the size of $X$, and at the same time their $q$-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of $X^{\otimes q}$. We show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.
翻译:我们用一个输入宽度时间的抽样算法,对光谱接近光谱接近格拉姆基质的格拉姆基质进行一个与美元一元一元一元、以美元一元基质为单位的气压成品相当的量方位的取样算法,使用接近最佳的样品数量,改进所有以前已知的方法,用美元(q)乘数(q)乘数(q)乘数。此外,对于以美元为单位的数据集,我们用的方法运行时间的顶点的特征矩阵,即美元-美元多元内核的特征矩阵,其领先期与数据集的大小成比例成正比,不依赖美元。 以往的技术要么在运行时会发生多(q)美元因素减速,要么以亚美分目标尺寸为单位对美元的依赖性降低。 我们的取样技术依靠以美元为单位的任意随机预测来收集部分对应的量方位数,这些预测可以同时适用于美元-X美元数据集的大小,而且不依赖美元。