Modern Monte Carlo-type approaches to dynamic decision problems are reformulated as empirical loss minimization, allowing direct applications of classical results from statistical machine learning. These computational methods are then analyzed in this framework to demonstrate their effectiveness as well as their susceptibility to generalization error. Standard uses of classical results prove potential overlearning, thus bias-variance trade-off, by connecting over-trained networks to anticipating controls. On the other hand, non-asymptotic estimates based on Rademacher complexity show the convergence of these algorithms for sufficiently large training sets. A numerically studied stylized example illustrates these possibilities, including the importance of problem dimension in the degree of overlearning, and the effectiveness of this approach.
翻译:现代蒙特卡洛式的动态决策问题方法被重新改写为经验损失最小化,允许直接应用统计机学习的古典成果,然后在这个框架内分析这些计算方法,以证明这些方法的有效性和容易发生一般错误。传统结果的标准使用证明可能超学,从而通过将受过过度训练的网络连接到预期的控制中,造成偏差权衡。另一方面,基于Rademacher复杂程度的非抽取性估计显示这些算法与足够大的培训组合的趋同。一个经过数字研究的典型例子说明了这些可能性,包括问题层面在超学程度中的重要性,以及这一方法的有效性。