In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand. In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU. We formulate the problem with this structure as Shared-Resource Stochastic Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO). Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
翻译:在本文中,我们考虑了库存管理(IM)问题,我们需要为大量库存持有单位(SKUs)做出充资决定,以平衡其供求平衡。在我们的背景中,对共享资源(如库存能力)的限制是对每个库存单位独立控制的制约。我们用共享资源存储游戏(SRSG)来表述这一结构的问题,并提出一种称为 " 环境意识分散式PPPO(CD-PPPO) " (CD-PPPO)的有效算法。通过广泛的实验,我们证明CD-PPPO可以比标准的MARL算法加快学习程序。