Designing a multi-layer optical system with designated optical characteristics is an inverse design problem in which the resulting design is determined by several discrete and continuous parameters. In particular, we consider three design parameters to describe a multi-layer stack: Each layer's dielectric material and thickness as well as the total number of layers. Such a combination of both, discrete and continuous parameters is a challenging optimization problem that often requires a computationally expensive search for an optimal system design. Hence, most methods merely determine the optimal thicknesses of the system's layers. To incorporate layer material and the total number of layers as well, we propose a method that considers the stacking of consecutive layers as parameterized actions in a Markov decision process. We propose an exponentially transformed reward signal that eases policy optimization and adapt a recent variant of Q-learning for inverse design optimization. We demonstrate that our method outperforms human experts and a naive reinforcement learning algorithm concerning the achieved optical characteristics. Moreover, the learned Q-values contain information about the optical properties of multi-layer optical systems, thereby allowing physical interpretation or what-if analysis.
翻译:设计具有指定光学特性的多层光学系统是一个反向设计问题,由此产生的设计是由若干离散和连续参数决定的。特别是,我们考虑三个设计参数来描述多层堆叠:每一层的电动材料和厚度以及层的总数。这种将两层、离散和连续参数结合起来是一个具有挑战性的优化问题,往往需要计算昂贵的搜索最佳系统设计。因此,大多数方法只是确定系统层的最佳厚度。为了将层材料和层的总数也包括在内,我们建议一种方法,将连续层堆叠视为Markov决策过程中的参数化行动。我们建议一个指数式的改变奖励信号,方便政策优化,并调整Q学习的近期变量,以进行反向设计优化。我们证明,我们的方法比人类专家更优秀,而且对于所实现的光学特性来说,也是一种天真的强化学习算法。此外,所学的Q价值包含关于多层光学系统光学特性的信息,从而允许进行物理解释或什么分析。