Recent literature established that neural networks can represent good policies across a range of stochastic dynamic models in supply chain and logistics. We propose a new algorithm that incorporates variance reduction techniques, to overcome limitations of algorithms typically employed in literature to learn such neural network policies. For the classical lost sales inventory model, the algorithm learns neural network policies that are vastly superior to those learned using model-free algorithms, while outperforming the best heuristic benchmarks by an order of magnitude. The algorithm is an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics, because the ideas in its development are generic.
翻译:最近的文献认定神经网络可以代表供应链和物流中一系列随机动态模型的好政策。 我们提出一种新的算法,纳入减少差异技术,以克服文献中通常用于学习神经网络政策的算法限制。 对于传统失传销售清单模型来说,算法学习神经网络政策,该模型比使用无模型算法而学的要高得多,同时以数量级比最优的超值基准。 算法是一个有趣的选择,可以应用于供应链和物流中的其他随机动态问题,因为其开发过程中的想法是通用的。