We propose a method for finding approximate compilations of quantum unitary transformations, based on techniques from policy gradient reinforcement learning. The choice of a stochastic policy allows us to rephrase the optimization problem in terms of probability distributions, rather than variational gates. In this framework, finding the optimal configuration is done by optimizing over distribution parameters, rather than over free angles. We show numerically that this approach can be more competitive than gradient-free methods, for comparable amounts of resources (i.e. quantum circuit runs). Another interesting feature of this approach to variational compilation is that it does not need a separate register and long-range interactions to estimate the end-point fidelity, which is an improvement over methods which rely on the Hilbert-Schmidt test. We expect these techniques to be relevant for training variational circuits in other contexts.
翻译:我们根据政策梯度强化学习的技术,建议一种方法来寻找量子单一变换的近似汇编。 选择一种随机政策,使我们能够用概率分布而不是变异门来重新描述优化问题。 在这个框架中,找到最佳配置的方法是通过分配参数优化,而不是通过自由角度优化。 我们从数字上表明,这一方法比无梯度方法更具竞争力,可以使用同等数量的资源(即量子电路运行)。 这种变异汇编方法的另一个有趣的特征是,它不需要单独注册和远程互动来估计最终点的忠性,这是对依赖Hilbert-Schmidt测试的方法的一种改进。我们期望这些技术对在其他情况下培训变异电路具有相关性。