To make machine learning (ML) sustainable and apt to run on the diverse devices where relevant data is, it is essential to compress ML models as needed, while still meeting the required learning quality and time performance. However, how much and when an ML model should be compressed, and {\em where} its training should be executed, are hard decisions to make, as they depend on the model itself, the resources of the available nodes, and the data such nodes own. Existing studies focus on each of those aspects individually, however, they do not account for how such decisions can be made jointly and adapted to one another. In this work, we model the network system focusing on the training of DNNs, formalize the above multi-dimensional problem, and, given its NP-hardness, formulate an approximate dynamic programming problem that we solve through the PACT algorithmic framework. Importantly, PACT leverages a time-expanded graph representing the learning process, and a data-driven and theoretical approach for the prediction of the loss evolution to be expected as a consequence of training decisions. We prove that PACT's solutions can get as close to the optimum as desired, at the cost of an increased time complexity, and that, in any case, such complexity is polynomial. Numerical results also show that, even under the most disadvantageous settings, PACT outperforms state-of-the-art alternatives and closely matches the optimal energy cost.
翻译:使机器学习(ML)可持续,并且能够在相关数据所在的不同设备上运行,因此,有必要根据需要压缩ML模型,同时满足所要求的学习质量和时间性能。然而,如果机器学习(ML)模式需要,则需要压缩ML模型,同时满足所要求的学习质量和时间性;然而,如果ML模型需要压缩多少,何时应压缩,培训应执行到何方,这是很难决定的,因为这些模型取决于模型本身、现有节点的资源以及这类数据本身,因此很难做出决定。 但是,现有的研究侧重于其中每一个方面,它们并没有说明如何联合和相互调整这些决定。 在这项工作中,我们模拟网络系统侧重于培训DNN,将以上多维问题正式化,并鉴于其NP-硬性,设计出一个我们通过PACT算法框架解决的大致动态的编程问题。重要的是,PACT利用一个时间过期的图表来代表学习过程,用数据驱动和理论方法来预测作为培训决定的结果。我们证明,PACT的解决方案甚至最接近于最复杂、最接近于成本和最接近于成本。