Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poet
翻译:边端设备(如移动电话)的微调模型将使个人对敏感数据保持隐私,然而,边端培训历来限于相对较小的具有简单结构的小型模型,因为培训既有记忆力也有能量密集度。我们提出POET算法,以便能够对大型神经网络进行内存催泪电池操作边缘设备的培训。POET共同优化了再成像和传动的综合搜索搜索空间,用两种算法来减少反反向调整的内存消耗。鉴于记忆预算和运行时间的限制,我们为能量优化培训制定了混合因果线性程序(MILP)。我们的方法使得能够对嵌入装置进行大得多的模型培训,同时减少能源消耗,同时不改变反向调整的数学正确性。我们证明,在Cortex-M级嵌入装置的记忆限制下,可以对ResNet-18和BERT进行微调,同时在能源效率方面超过目前的边缘培训方法。POET是一个开放源项目,可在https://github.com/ShishhirPat/poet/poet上查阅。