Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization. However, these algorithms may require careful hyperparameter tuning for each problem instance. We use a reinforcement learning agent in conjunction with a quantum-inspired algorithm to solve the Ising energy minimization problem, which is equivalent to the Maximum Cut problem. The agent controls the algorithm by tuning one of its parameters with the goal of improving recently seen solutions. We propose a new Rescaled Ranked Reward (R3) method that enables stable single-player version of self-play training that helps the agent to escape local optima. The training on any problem instance can be accelerated by applying transfer learning from an agent trained on randomly generated problems. Our approach allows sampling high-quality solutions to the Ising problem with high probability and outperforms both baseline heuristics and a black-box hyperparameter optimization approach.
翻译:量子硬件和量子激励算法对于组合优化越来越受欢迎。 但是, 这些算法可能需要对每个问题实例进行仔细的超参数调试。 我们使用一个强化学习代理和量子激励算法来解决Ising 能源最小化问题, 这相当于最大剪裁问题。 该代理商通过调整其参数之一来控制算法, 目的是改进最近看到的解决办法。 我们提出了一个新的重新定级奖分( R3) 方法, 使自我游戏培训的单一玩家版本能够帮助代理商摆脱本地的opima。 任何问题实例的培训可以通过应用从随机产生的问题受训代理商的转移学习而加快。 我们的方法允许对Ising问题的高质量解决方案进行取样, 其概率高, 且优于黑盒子的超参数优化方法。