Coded distributed computing (CDC) is a new technique proposed with the purpose of decreasing the intense data exchange required for parallelizing distributed computing systems. Under the famous MapReduce paradigm, this coded approach has been shown to decrease this communication overhead by a factor that is linearly proportional to the overall computation load during the mapping phase. Nevertheless, it is widely accepted that this overhead remains a main bottleneck in distributed computing. To address this, we take a new approach and we explore a new system model which, for the same aforementioned overall computation load of the mapping phase, manages to provide astounding reductions of the communication overhead and, perhaps counterintuitively, a substantial increase of the computational parallelization. In particular, we propose multi-access distributed computing (MADC) as a novel generalization of the original CDC model, where now mappers and reducers are distinct computing nodes that are connected through a multi-access network topology. Focusing on the MADC setting with combinatorial topology, which implies $\Lambda$ mappers and $K$ reducers such that there is a unique reducer connected to any $\alpha$ mappers, we propose a novel coded scheme and a novel information-theoretic converse, which jointly identify the optimal inter-reducer communication load to within a constant gap of $1.5$. Additionally, a modified coded scheme and converse identify the optimal max-link communication load across all existing links to within a gap of $4$. The unparalleled coding gains reported here should not be simply credited to having access to more mapped data, but rather to the powerful role of topology in effectively aligning mapping outputs. This realization raises the open question of which multi-access network topology guarantees the best possible performance in distributed computing.
翻译:代码分布式计算(CDC)是一种新的技术,目的是减少平行分布式计算系统所需的密集数据交换。 在著名的 MapReduce 模式下,这种代码化方法被显示为通过一个与绘图阶段总体计算负荷成正比的元素来减少通信管理费用。 然而,人们广泛接受,这个代码化计算(CDC)仍然是分布式计算中的一个主要瓶颈。为了解决这个问题,我们采取了一种新的方法,并探索一种新的系统模型,对于上述的绘图阶段总体计算负荷,它能够提供通信管理费的惊人减少,也许可以反直观地大幅提高计算平行。特别是,我们建议多功能化分布式计算(MADC),作为原始的CDC模型的新型概括,现在的映射器和缩放器仍然是通过多功能化网络表解密连接起来的独特的计算节点。我们以组合式的设置为打开问题的设置,这意味着 $\Lambda, 直径化连接点和 $K$ 降低一个独特的缩略式连接到任何美元内部的计算值。我们提议在任何恒定式网络内部进行一个最精确的自动的计算, 将自动的代码化的代码化的计算。