Neural networks training on edge terminals is essential for edge AI computing, which needs to be adaptive to evolving environment. Quantised models can efficiently run on edge devices, but existing training methods for these compact models are designed to run on powerful servers with abundant memory and energy budget. For example, quantisation-aware training (QAT) method involves two copies of model parameters, which is usually beyond the capacity of on-chip memory in edge devices. Data movement between off-chip and on-chip memory is energy demanding as well. The resource requirements are trivial for powerful servers, but critical for edge devices. To mitigate these issues, We propose Resource Constrained Training (RCT). RCT only keeps a quantised model throughout the training, so that the memory requirements for model parameters in training is reduced. It adjusts per-layer bitwidth dynamically in order to save energy when a model can learn effectively with lower precision. We carry out experiments with representative models and tasks in image application and natural language processing. Experiments show that RCT saves more than 86\% energy for General Matrix Multiply (GEMM) and saves more than 46\% memory for model parameters, with limited accuracy loss. Comparing with QAT-based method, RCT saves about half of energy on moving model parameters.
翻译:边缘终端的神经网络培训对于边缘AI 计算至关重要, 它需要适应进化环境。 量化模型可以在边缘设备上有效运行, 但是这些紧凑模型的现有培训方法设计在强大的服务器上运行, 拥有丰富的内存和能量预算。 例如, 量化认知培训(QAT) 方法涉及两套模型参数, 通常超出了边缘设备在边端设备上芯片内存的能力。 离芯机和上芯片内存之间的数据移动同样需要能量。 对强大的服务器来说, 资源需求微不足道, 但对于边缘设备来说至关重要。 为了缓解这些问题, 我们提议资源整合培训(RCT) 。 RCT 在整个培训过程中只保留一个四分解模型模型, 以便降低培训中模型参数的记忆要求。 它会动态调整每层的位宽度, 以便当模型能够以较低的精确度学习时节省能源。 我们在图像应用和自然语言处理中用具有代表性的模型和任务进行实验。 实验显示, RCT 节省了86 以上的能源, 用于通用矩阵多式多功能( GEMMM) 的模型(GMMM) 并保存了半存储的能源损失参数。