The increasing interest in serverless computation and ubiquitous wireless networks has led to numerous connected devices in our surroundings. Among such devices, IoT devices have access to an abundance of raw data, but their inadequate resources in computing limit their capabilities. Specifically, with the emergence of deep neural networks (DNNs), not only is the demand for the computing power of IoT devices increasing but also privacy concerns are pushing computations towards the edge. To overcome inadequate resources, several studies have proposed the distribution of work among IoT devices. These promising methods harvest the aggregated computing power of the idle IoT devices in an environment. However, since such a distributed system strongly relies on each device, unstable latencies, and intermittent failures, the common characteristics of IoT devices and wireless networks, cause high recovery overheads. To reduce this overhead, we propose a novel robustness method with a close-to-zero recovery latency for DNN computations. Our solution never loses a request or spends time recovering from a failure. To do so, first, we analyze the underlying matrix-matrix computations affected by distribution. Then, we introduce a new coded distributed computing (CDC) method that has a constant cost with the increasing number of devices, unlike the linear cost of modular redundancies. Moreover, our method is applied in the library level, without requiring extensive changes to the program, while still ensuring a balanced work assignment during distribution. To illustrate our method, we perform experiments with distributed systems comprising up to 12 Raspberry Pis.
翻译:无服务器计算和无线网络的兴趣日益浓厚,导致我们周围有许多连接装置。在这类装置中,IOT装置能够获取大量原始数据,但是在计算能力上却资源不足。具体地说,随着深度神经网络(DNNS)的出现,不仅对IOT装置计算能力的需求日益增长,而且隐私问题也正在将计算推向边缘。为了克服资源不足,一些研究提议在IOT装置之间分配工作。这些有希望的方法在环境中收获闲置的IOT装置的集成计算能力。然而,由于这种分布式系统非常依赖每个装置、不稳定的延迟和间歇性故障,因此IOT装置和无线网络的共同特点导致高回收间接费用。为了减少这一间接费用,我们提出了一种新型的稳健性方法,为DNNNE计算提供接近零恢复时间。我们的解决办法从未因为失败而失去一项要求,或花费时间来恢复平衡。为了做到这一点,首先,我们分析了受分配影响的基本矩阵矩阵矩阵计算结果,在环境中,由于每个装置的分布、不稳定的延迟和间断断断断断断断断断,因此,我们分配方法在不断的计算方法中,我们不断使用。然后,我们采用一种不断分配的计算方法,在不断分配的计算方法,而需要一种不断计算方法。