Stochastic dual dynamic programming (SDDP) is a state-of-the-art method for solving multi-stage stochastic optimization, widely used for modeling real-world process optimization tasks. Unfortunately, SDDP has a worst-case complexity that scales exponentially in the number of decision variables, which severely limits applicability to only low dimensional problems. To overcome this limitation, we extend SDDP by introducing a trainable neural model that learns to map problem instances to a piece-wise linear value function within intrinsic low-dimension space, which is architected specifically to interact with a base SDDP solver, so that can accelerate optimization performance on new instances. The proposed Neural Stochastic Dual Dynamic Programming ($\nu$-SDDP) continually self-improves by solving successive problems. An empirical investigation demonstrates that $\nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality over competitors such as SDDP and reinforcement learning algorithms, across a range of synthetic and real-world process optimization problems.
翻译:为了克服这一限制,我们推广了SDDP, 引入了一个可训练的神经模型,该模型可以将问题实例映射到内在的低层空间的片断线性值功能中,该模型专门设计成一个SDDP求解器,可以加速新情况下的优化性能。拟议的神经斯托克双重动态程序(nu$$-SDDP)通过解决连续的问题不断自我改进。一项经验调查显示,$\nu$-SDDP可以大量降低问题解决成本,同时不牺牲SDDP和强化学习算法等竞争对手的解决方案质量,而不必牺牲SDDP和增强学习算法等竞争对手的解决方案质量。