We propose a model-based reinforcement learning (RL) approach for noisy time-dependent gate optimization with improved sample complexity over model-free RL. Sample complexity is the number of controller interactions with the physical system. Leveraging an inductive bias, inspired by recent advances in neural ordinary differential equations (ODEs), we use an auto-differentiable ODE parametrised by a learnable Hamiltonian ansatz to represent the model approximating the environment whose time-dependent part, including the control, is fully known. Control alongside Hamiltonian learning of continuous time-independent parameters is addressed through interactions with the system. We demonstrate an order of magnitude advantage in the sample complexity of our method over standard model-free RL in preparing some standard unitary gates with closed and open system dynamics, in realistic numerical experiments incorporating single shot measurements, arbitrary Hilbert space truncations and uncertainty in Hamiltonian parameters. Also, the learned Hamiltonian can be leveraged by existing control methods like GRAPE for further gradient-based optimization with the controllers found by RL as initializations. Our algorithm that we apply on nitrogen vacancy (NV) centers and transmons in this paper is well suited for controlling partially characterised one and two qubit systems.
翻译:我们提出了一种基于模型的强化学习方法,用于对噪声时间相关门的优化,相较于基于模型的强化学习,其具有更高的样本复杂度。样本复杂度是指控制器与物理系统交互的次数。我们利用感性的偏见,借鉴了最新的神经常微分方程(ODE)技术,使用一个通过可学习的汉密尔顿描述的自动微分ODE来表示环境的模型近似。其中,包括暴露于完全已知的时间依赖部分(包括控制)。通过系统的交互来实现控制和汉密尔顿学习的连续时间无关参数。在具备单次测量、任意希尔伯特空间截断以及不确定的汉密尔顿参数的现实数字实验中,我们的方法在样本复杂度方面比标准的基于模型的强化学习具有一个量级的优势,能够实现一些标准酉门的准备,包括开放系统和封闭系统动力学。此外,已学会的汉密尔顿可以与现有的控制方法(如GRAPE)相结合,通过以RL发现的控制器为初始化进行梯度优化。本文中介绍的算法适用于控制部分特征化的 1 个和 2 个量子比特系统,我们在含有氮空位(NV)中心和变压器元件上进行了应用。