First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Meszaros problems. Code for RLQP is available at https://github.com/berkeleyautomation/rlqp.
翻译:OSQP等二次优化的第一阶方法被广泛用于大规模机器学习和嵌入最佳控制,许多相关问题必须迅速解决。 这些方法面临着两个持续的挑战:人工超参数调制和聚合时间,以找到高精度解决方案。为了解决这些问题,我们探讨了加强学习(RL)如何学会调整参数的政策,以加快趋同速度。在以众所周知的QP基准进行的实验中,我们发现我们的RL政策(RLQP)大大优于最先进的QP解答器,高达3x。 RLQP一般地说,令人惊讶的是,以前所见的问题具有不同层面和结构的不同应用,包括QPLIB、Netlib LP和Maros-Meszaros问题。 RLQP的代码可在https://github.com/berkeleyoutomation/rlqp上查阅。