A central issue of distributed computing systems is how to optimally allocate computing and storage resources and design data shuffling strategies such that the total execution time for computing and data shuffling is minimized. This is extremely critical when the computation, storage and communication resources are limited. In this paper, we study the resource allocation and coding scheme for the MapReduce-type framework with limited resources. In particular, we focus on the coded distributed computing (CDC) approach proposed by Li et al.. We first extend the asymmetric CDC (ACDC) scheme proposed by Yu et al. to the cascade case where each output function is computed by multiple servers. Then we demonstrate that whether CDC or ACDC is better depends on system parameters (e.g., number of computing servers) and task parameters (e.g., number of input files), implying that neither CDC nor ACDC is optimal. By merging the ideas of CDC and ACDC, we propose a hybrid scheme and show that it can strictly outperform CDC and ACDC. Furthermore, we derive an information-theoretic converse showing that for the MapReduce task using a type of weakly symmetric Reduce assignment, which includes the Reduce assignments of CDC and ACDC as special cases, the hybrid scheme with a corresponding resource allocation strategy is optimal, i.e., achieves the minimum execution time, for an arbitrary amount of computing servers and storage memories.
翻译:分布式计算系统的一个中心问题是,如何以最佳方式分配计算和存储资源,并设计数据打乱战略,以便最大限度地减少计算和数据打乱的总执行时间。当计算、存储和通信资源有限时,这一点至关重要。在本文中,我们研究资源有限的地图菜单型框架的资源分配和编码办法。特别是,我们侧重于李等人提出的编码式分配计算(CDC)方法。我们首先将尤等人提出的不对称的CDC(ACDC)方案扩大到由多个服务器计算每个输出函数的连锁案例。然后我们证明CDC或ACDC是否更取决于系统参数(例如计算服务器的数目)和任务参数(例如输入文件的数目),这意味着CDC或ACDC都不理想。通过合并CD和ACD等人提出的编码式计算方法,我们提出了一个混合方案,显示它能够严格地超越CDC和ACDC。此外,我们从信息学角度对时间的反向显示,对于地图中心或ACDC的计算式服务器,是否更好地取决于系统参数参数(例如计算机服务器的数目)和任务参数(例如输入输入输入输入的输入的输入的输入最弱的硬的存储式的存储式的存储式存储式存储式的存储式存储式的存储式的存储式的存储式的存储式的存储式任务)战略,我们为最微小的存储式的存储式的存储式的存储式的CDC和计算式的存储式的策略。