In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Most coded matrix-matrix computation work has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. Inspired by Cross-Subspace Alignment codes, we develop Flexible Cross-Subspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We demonstrate that, under certain conditions, FCSA codes are within a factor of 2 optimal when it comes to straggler resilience and simulations demonstrate that our codes achieve even better optimality gaps in practice.
翻译:在本文中,我们引入了可变编码分布批量矩阵乘数问题, 要求一个分布式系统在批量工作不一定不同的情况下进行批量矩阵乘数。 大多数编码矩阵矩阵计算工作大致集中于两个方向: 用于计算单项计算任务的矩阵分割和多种不同计算任务的批量处理。 虽然这些工作为代码提供了具有良好的分解弹性和问题空间快速解码功能的代码,但这些代码无法利用跨批量工作重复使用矩阵的自然冗余。 在跨子空间协调代码的启发下, 我们开发了足够灵活的灵活跨子空间比对代码(FCSA ), 以便使用这种冗余。 我们对 FCSA 代码作了全面描述, 允许使用多种系统复杂性, 包括良好的strggler弹性和快速解码。 我们证明, 在某些条件下, FCSA 代码在分解码和模拟中处于2个最佳要素之内, 表明我们的代码在实践上达到了更好的最佳差距。