Recent literature established that neural networks can represent good policies across a range of stochastic dynamic models in supply chain and logistics. We incorporate variance reduction techniques in a newly proposed algorithm, to overcome limitations of the model-free algorithms typically employed to learn such neural network policies. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while outperforming the best heuristic benchmarks by an order of magnitude. The algorithm is an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics, because the ideas in its development are generic.
翻译:最近文献证实,神经网络可以代表供应链和物流中一系列随机动态模型的良好政策。我们把减少差异技术纳入新提议的算法,以克服通常用于学习神经网络政策的无型算法的局限性。对于传统失传销售清单模型来说,算法学习神经网络政策优于使用无型算法而优于使用无型算法所学的神经网络政策,同时在数量上优于最好的超值基准。算法是一个有趣的选择,可以应用于供应链和物流中的其他随机动态问题,因为其发展思想是通用的。