In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.
翻译:在Chanjun Fan 等人(https://doi.org/10.1038/s41467-023-363-w (2023))中,作者们展示了一种强化的学习方法,以强化组合式优化超常理论。特别是,他们展示了几个旋转玻璃地面状态问题的结果,在这些问题上,非平板网络的情况一般都是NP-hard,而基于Monte Carlo的几种方法,如模拟肛交(SA)或平行调情(PT)。事实上,这些结果表明,强化的学习改善了在用SA或PT(P)获得的结果上取得的结果,或至少允许在获得可比质量结果之前减少超常理论的运行时间。为了便于得出其方法为“超强”的结论,作者们采取了两种基本策略:(1) 一个商业的GUROBI求解答器,用来采集精确地面状态的样本,用来进行比较。(2) 一个头到头的对比,但头对头的对比, 而不是直接的错误,至少可以让更多的作者们看到一个更精确的直值, 在地面的样本中,我们找到一个最接近的直基的直基的直基的基的模型。