In this work, a novel idea is presented for combinatorial optimization problems, a hybrid network, which results in a superior outcome. We applied this method to graph pointer networks [1], expanding its capabilities to a higher level. We proposed a hybrid pointer network (HPN) to solve the travelling salesman problem trained by reinforcement learning. Furthermore, HPN builds upon graph pointer networks which is an extension of pointer networks with an additional graph embedding layer. HPN outperforms the graph pointer network in solution quality due to the hybrid encoder, which provides our model with a verity encoding type, allowing our model to converge to a better policy. Our network significantly outperforms the original graph pointer network for small and large-scale problems increasing its performance for TSP50 from 5.959 to 5.706 without utilizing 2opt, Pointer networks, Attention model, and a wide range of models, producing results comparable to highly tuned and specialized algorithms. We make our data, models, and code publicly available [2].
翻译:在这项工作中,提出了一个关于组合优化问题的新想法,即混合网络,其结果是优异的结果。我们将这种方法应用于图形指针网络[1],将其能力扩大到更高的水平。我们建议建立一个混合指针网络(HPN),以解决通过强化学习培训的巡回销售人员问题。此外,HPN建在图形指针网络上,这是带有附加图示嵌入层的指针网络的延伸。由于混合编码器,HPN在解决方案质量上优于图形指针网络,为我们的模型提供了一种虚拟编码类型,使我们的模型能够汇集到更好的政策中。我们的网络大大优于最初的小型和大规模问题的图形指针网络(HPN),提高了TSP50从5.959到5.706的性能,没有使用2Ot、指针网络、注意模型和广泛的模型,产生与高度调控和专业算法相类似的结果。我们的数据、模型和代码可以公开[2]。