Many real-world data are naturally represented as a sparse reorderable matrix, whose rows and columns can be arbitrarily ordered (e.g., the adjacency matrix of a bipartite graph). Storing a sparse matrix in conventional ways requires an amount of space linear in the number of non-zeros, and lossy compression of sparse matrices (e.g., Truncated SVD) typically requires an amount of space linear in the number of rows and columns. In this work, we propose NeuKron for compressing a sparse reorderable matrix into a constant-size space. NeuKron generalizes Kronecker products using a recurrent neural network with a constant number of parameters. NeuKron updates the parameters so that a given matrix is approximated by the product and reorders the rows and columns of the matrix to facilitate the approximation. The updates take time linear in the number of non-zeros in the input matrix, and the approximation of each entry can be retrieved in logarithmic time. We also extend NeuKron to compress sparse reorderable tensors (e.g. multi-layer graphs), which generalize matrices. Through experiments on ten real-world datasets, we show that NeuKron is (a) Compact: requiring up to five orders of magnitude less space than its best competitor with similar approximation errors, (b) Accurate: giving up to 10x smaller approximation error than its best competitors with similar size outputs, and (c) Scalable: successfully compressing a matrix with over 230 million non-zero entries.
翻译:许多真实世界的数据自然地表示为稀疏可重排序矩阵,其行和列可以任意排序(例如双部图的邻接矩阵)。将稀疏矩阵以传统方式存储需要线性数量的空间,而对稀疏矩阵进行有损压缩(例如截断SVD)通常需要数量级与行数和列数呈线性关系的空间。在本文中,我们提出了NeuKron来将稀疏可重排序矩阵压缩到恒定大小的空间中。NeuKron使用具有恒定参数数量的递归神经网络通用化Kronecker乘积。NeuKron更新参数,以便给定矩阵通过乘积逼近,并重新排序矩阵的行和列以便于逼近。更新时间与输入矩阵中的非零元素数量成正比,且可以以对数时间检索每个条目的逼近值。我们还将NeuKron扩展到了压缩稀疏可重排序张量(例如多层图),其是矩阵的推广。通过对十个真实世界数据集的实验,我们展示了NeuKron具有以下优点:(a)紧凑:与具有类似逼近误差的最佳竞争者相比,所需的空间少达五个数量级;(b)精确:输出大小相似的与其最佳竞争者相比,误差可小达10倍;(c)可扩展:能够成功压缩具有超过2.3亿非零元素的矩阵。