We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance to backpropagation. Assuming realistic timescales and hardware parameters, our results indicate that these optimization techniques can train a network on emerging hardware platforms orders of magnitude faster than the wall-clock time of training via backpropagation on a standard GPU, even in the presence of imperfect weight updates or device-to-device variations in the hardware. We additionally describe how it can be applied to existing hardware as part of chip-in-the-loop training, or integrated directly at the hardware level. Crucially, the MGD framework is highly flexible, and its gradient descent process can be optimized to compensate for specific hardware limitations such as slow parameter-update speeds or limited input bandwidth.
翻译:我们提出了多轴梯度下降框架(MGD),这是一个旨在方便地培训硬件模拟或数字神经网络的梯度下降框架(MGD)。MGD使用零顺序优化技术进行硬件神经网络在线培训。我们展示了它在现代机器学习数据集(包括CIFAR-10和Fashon-MNIST)上培训神经网络的能力,并将其性能与反向剖析法作比较。假设现实的时间尺度和硬件参数,我们的结果表明,这些优化技术可以在新的硬件平台上对网络进行培训,其规模比在标准GPU上进行反向调整的倒数24小时时间要快。即使在硬件有不完善的重量更新或设备到装置的变异的情况下也是如此。我们进一步描述了如何将网络应用到现有的硬件中,作为芯片内部培训的一部分,或直接在硬件一级进行整合。毫无疑问,MGD框架非常灵活,其渐变进程可以优化,以弥补具体的硬件限制,如慢参数更新速度或有限的输入带宽。</s>