While combinatorial problems are of great academic and practical importance, previous approaches like explicit heuristics and reinforcement learning have been complex and costly. To address this, we developed a simple and robust method to train a Deep Neural Network (DNN) through self-supervised learning for solving a goal-predefined combinatorial problem. Assuming that more optimal moves occur more frequently as a path of random moves connecting two problem states, the DNN can approximate an optimal solver by learning to predict the last move of a random scramble based on the problem state. Tested on 1,000 scrambled Rubik's Cube instances, a Transformer-based model could solve all of them near-optimally using a breadth-first search; with a maximum breadth of $10^3$, the mean solution length was $20.5$ moves. The proposed method may apply to other goal-predefined combinatorial problems, though it has a few constraints.
翻译:虽然组合问题具有极大的学术和实际重要性,但以往的清晰的超自然学和强化学习等方法一直复杂而昂贵。 为了解决这个问题,我们开发了一种简单而有力的方法,通过自我监督的学习来培训深神经网络(DNN ), 以解决一个预定目标的组合问题。 假设更优化的动作更频繁地作为连接两个问题国家的随机动作路径出现, DNN 可以通过学习预测基于问题状态的随机动作的最后动作来接近一个最佳的解决方案。 在1,000个拼凑的Rubik Cube 实例中测试了一个基于变异器的模型,可以使用宽度第一搜索近乎最理想地解决所有这些案例;在最大宽度为 10 3 美元的情况下,平均解决方案长度为 20.5 美元的移动。 拟议的方法可以适用于其他预定目标的组合问题,尽管它有一些限制。