We study matrix multiplication in the low-bandwidth model: There are $n$ computers, and we need to compute the product of two $n \times n$ matrices. Initially computer $i$ knows row $i$ of each input matrix. In one communication round each computer can send and receive one $O(\log n)$-bit message. Eventually computer $i$ has to output row $i$ of the product matrix. We seek to understand the complexity of this problem in the uniformly sparse case: each row and column of each input matrix has at most $d$ non-zeros and in the product matrix we only need to know the values of at most $d$ elements in each row or column. This is exactly the setting that we have, e.g., when we apply matrix multiplication for triangle detection in graphs of maximum degree $d$. We focus on the supported setting: the structure of the matrices is known in advance; only the numerical values of nonzero elements are unknown. There is a trivial algorithm that solves the problem in $O(d^2)$ rounds, but for a large $d$, better algorithms are known to exist; in the moderately dense regime the problem can be solved in $O(dn^{1/3})$ communication rounds, and for very large $d$, the dominant solution is the fast matrix multiplication algorithm using $O(n^{1.158})$ communication rounds (for matrix multiplication over rings). In this work we show that it is possible to overcome quadratic barrier for all values of $d$: we present an algorithm that solves the problem in $O(d^{1.907})$ rounds for rings and $O(d^{1.927})$ rounds for semirings, independent of $n$.
翻译:我们研究的是低频宽模式中的矩阵乘法: 计算机是美元, 我们需要计算两个值为美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元。 最初计算机是美元=美元=美元=每组输入矩阵中美元=美元=美元=美元=美元=美元=美元=美元。 在一个通信回合中,每个计算机可以发送和接收一个美元=美元( log n)- 位元=美元=美元。 最终计算机美元=美元=输出产品矩阵中的一行美元=美元=美元=美元=美元。 我们试图了解这个问题的复杂性。 每个输入矩阵中的每行和列的值为美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元= 数=美元=美元=美元=%=%=美元=美元=美元=美元=美元=美元=%=%= = 以正数=美元=美元=美元= 以正数=美元=xxxxxxxxxxxxxxxx