We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors. Furthermore, for the important special case of the $q$-fold self-tensoring of a dataset, which is the feature matrix of the degree-$q$ polynomial kernel, the leading term of our method's runtime is proportional to the size of the input dataset and has no dependence on $q$. Previous techniques either incur poly$(q)$ slowdowns in their runtime or remove the dependence on $q$ at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time that only depends on the size of $X$, and at the same time their $q$-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of $X^{\otimes q}$. We also show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.
翻译:我们建议一种输入宽度时间取样算法,可以光谱地将Gram 矩阵与美元一元一元一元一元的美元柱状价格成品基质(美元一元一元一元)相当,使用几乎最佳的样本数量,改进所有以前已知的方法,使用聚(q)元因素。此外,对于一个数据集的美元双倍自我强化的重要特殊案例,即一个数据集的特征矩阵,即一个单位-q美元多元内核的特征矩阵,我们方法运行时间的主要期限与输入数据集的大小成正比,不依赖美元。 以前的技术要么在运行时产生聚(q)美元减速,要么以降低对美元的依赖,以牺牲亚最佳目标尺寸为代价,从而消除对美元的依赖。对于一个数据集的大小,我们的取样技术依靠收集部分相关的美元随机预测,这些预测可以同时适用于一个数据集的美元X美元总时期,仅取决于美元x元的大小,而在同样的时间里,它们的运行时间里程(q$-q)美元运行时, 也以美元对美元基数的美元基数级(美元)的基数级,作为我们总的基数的基数的基数,显示我们总的基数的基数的基数的基数的基数。