An optical neural network (ONN) is a promising system due to its high-speed and low-power operation. Its linear unit performs a multiplication of an input vector and a weight matrix in optical analog circuits. Among them, a circuit with a multiple-layered structure of programmable Mach-Zehnder interferometers (MZIs) can realize a specific class of unitary matrices with a limited number of MZIs as its weight matrix. The circuit is effective for balancing the number of programmable MZIs and ONN performance. However, it takes a lot of time to learn MZI parameters of the circuit with a conventional automatic differentiation (AD), which machine learning platforms are equipped with. To solve the time-consuming problem, we propose an acceleration method for learning MZI parameters. We create customized complex-valued derivatives for an MZI, exploiting Wirtinger derivatives and a chain rule. They are incorporated into our newly developed function module implemented in C++ to collectively calculate their values in a multi-layered structure. Our method is simple, fast, and versatile as well as compatible with the conventional AD. We demonstrate that our method works 20 times faster than the conventional AD when a pixel-by-pixel MNIST task is performed in a complex-valued recurrent neural network with an MZI-based hidden unit.
翻译:光导神经网络(ONN)由于其高速和低功率操作,是一个很有希望的系统。它的线性单元是一个输入矢量的倍增和光学模拟电路的重量矩阵。其中,一个具有多层结构的可编程马赫-泽德干涉仪(MZIs)干涉仪(MZIs)的电路可以实现一个特定的单一矩阵类别,其重量矩阵数量有限,MZIs(MZIs)是有限的。电路对于平衡可编程的MZI数量和ONN性能是有效的。然而,光线性单元需要很多时间学习MZI参数,使用常规的自动区分(AD)来学习。为了解决耗时性的问题,我们建议了一个学习MZI参数的加速方法。我们为MZI创建了定制的复杂价值衍生物,利用了Wirtinger的衍生物和链则。它们被纳入了我们在C++中执行的新开发的功能模块,以一个多层结构的结构来集体计算其价值。我们的方法简单、快速、灵活且与常规单位相容相容。我们的方法在常规的MAD-MIS-AD-AD-AD-AD-S-AD-AD-S-S-S-S-S-S-ADL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I