As deep neural networks (DNNs) grow to solve increasingly complex problems, they are becoming limited by the latency and power consumption of existing digital processors. 'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by DNNs by eliminating expensive weight updates; however, with scalability limited to an input vector length $K$ of hundreds of elements. Here, we present a scalable, single-shot-per-layer weight-stationary optical processor that leverages the advantages of free-space optics for passive optical copying and large-scale distribution of an input vector and integrated optoelectronics for static, reconfigurable weighting and the nonlinearity. We propose an optimized near-term CMOS-compatible system with $K = 1,000$ and beyond, and we calculate its theoretical total latency ($\sim$10 ns), energy consumption ($\sim$10 fJ/MAC) and throughput ($\sim$petaMAC/s) per layer. We also experimentally test DNN classification accuracy with single-shot analog optical encoding, copying and weighting of the MNIST handwritten digit dataset in a proof-of-concept system, achieving 94.7% (similar to the ground truth accuracy of 96.3%) without retraining on the hardware or data preprocessing. Lastly, we determine the upper bound on throughput of our system ($\sim$0.9 exaMAC/s), set by the maximum optical bandwidth before significant loss of accuracy. This joint use of wide spectral and spatial bandwidths enables highly efficient computing for next-generation DNNs.
翻译:随着深神经网络(DNNS)日益增长,以解决日益复杂的问题,它们正受到现有数字处理器的延缓和电流消耗的局限。 提议通过取消昂贵的重量更新,减少DNS所需要的计算资源;然而,由于可缩放限于一个输入矢量长度的数百元元素K美元,因此,由于深度神经网络(DNNS)日益增长,它们正受到现有数字处理器的延缓和电流消耗的制约。 提议通过消除昂贵的重量更新,减少DNNS所需要的计算资源。 然而,由于可缩放限于输入矢量长度的数百元元素。在这里,我们展示了一个可缩放的、单向每层重重力的单向-每平级光学处理器(DJ/MAC)和每层的通量(Simpeta MAC/s)的优势。 我们还在不试验DNNU值的精确度分类中,通过单一的模拟数据采集系统(MISDR)的精确度,通过硬拷贝,用这一精确度的直径直径直径的系统,我们用DMISDRDRDRMDM的数据。