GPU加速值迭代和仿真：使用JAX进行易腐库存控制 (Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX)

Value iteration can find the optimal replenishment policy for a perishable inventory problem, but is computationally demanding due to the large state spaces that are required to represent the age profile of stock. The parallel processing capabilities of modern GPUs can reduce the wall time required to run value iteration by updating many states simultaneously. The adoption of GPU-accelerated approaches has been limited in operational research relative to other fields like machine learning, in which new software frameworks have made GPU programming widely accessible. We used the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level API, and relied on this library's function transformations and compiler to efficiently utilize GPU hardware. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We demonstrate this on example scenarios from three recent studies which include problems with over 16 million states and additional problem features, such as substitution between products, that increase computational complexity. We compare the performance of the optimal replenishment policies to heuristic policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters on thousands of simulated years. The heuristic policies gave a maximum optimality gap of 2.49%. Our general approach may be applicable to a wide range of problems in operational research that would benefit from large-scale parallel computation on consumer-grade GPU hardware.

翻译：值迭代可以找到易腐库存问题的最优补货策略，但由于表示库存年龄基本面的大状态空间，其计算需求很高。现代GPU的并行处理能力可以通过同时更新多个状态来减少运行值迭代所需的时间。与其他领域比如机器学习相比，GPU加速方法在运营研究领域的应用受到了限制，机器学习自动提供了新的软件框架，使GPU编程得到了广泛的应用。我们使用Python的JAX库在高级API中实现了值迭代和底层马尔可夫决策过程模拟器，并依赖于该库的函数转换和编译器来高效利用GPU硬件。我们的方法可以将值迭代的使用扩展到以前被认为是不可行或不切实际的设置。我们在三个最近的研究示例场景上进行演示，包括具有超过1600万个状态和增加了计算复杂性的其他问题特性（例如产品替代）的问题。我们将最优补货策略的性能与启发式策略进行比较。在JAX中使用仿真优化拟合多个候选策略参数在数千年的多个模拟中进行并行评估。合成策略给出的最大优化差距为2.49%。我们的一般方法可能适用于运营研究中的许多问题，这些问题将从消费级GPU硬件上的大规模并行计算中受益。