We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least-squares problems arising in Candecomp / PARAFAC decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors and synthetic data validate our theoretical claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows.
翻译:我们根据杠杆分数的准确分布,展示了从Khatri-Rao产品中随机抽取数个矩阵数据的数据结构。我们提议的取样员在Khatri-Rao产品高度以时间对数抽取每行,在列数计数时以二次对数抽取每行,输入矩阵的最大尺寸是持续空间间接载荷。因此,即使构成Khatri-Rao产品的矩阵每行有数千万行,它也可以随意抽取样品。当用来勾画Candecomp/PARAFAC脱形产生的线性最小平方问题时,我们的方法在每次解形时比最新的最新方法要低。关于10亿级稀疏强的实验和合成数据的实验证实了我们的理论主张,我们的算法随着分解等级的增长,比竞争性方法的精确度更高。