深入受控地学习用于损失销售库存管制的多用途发展方案政策 (Deep controlled learning of MDP policies with an application to lost-sales inventory control)

Recent literature established that neural networks can represent good policies across a range of stochastic dynamic models in supply chain and logistics. We incorporate variance reduction techniques in a newly proposed algorithm, to overcome limitations of the model-free algorithms typically employed to learn such neural network policies. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while outperforming the best heuristic benchmarks by an order of magnitude. The algorithm is an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics, because the ideas in its development are generic.

翻译：最近文献证实,神经网络可以代表供应链和物流中一系列随机动态模型的良好政策。我们把减少差异技术纳入新提议的算法,以克服通常用于学习神经网络政策的无型算法的局限性。对于传统失传销售清单模型来说,算法学习神经网络政策优于使用无型算法而优于使用无型算法所学的神经网络政策,同时在数量上优于最好的超值基准。算法是一个有趣的选择,可以应用于供应链和物流中的其他随机动态问题,因为其发展思想是通用的。

相关内容

Neural Networks

关注 1649

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日