在深入强化学习多试剂市场中以AI驱动的价格实现多样化目标 (Achieving Diverse Objectives with AI-driven Prices in Deep Reinforcement Learning Multi-agent Markets)

We propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent, operating in an environment of other learning agents. Compared to the idealized market equilibrium outcome -- which we use as a benchmark -- our policymaker is much more flexible, allowing us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers' and sellers' welfare, etc. To evaluate our approach, we design a realistic market with multiple and diverse buyers and sellers. Additionally, the sellers, which are deep learning agents themselves, compete for resources in a common-pool appropriation environment based on bio-economic models of commercial fisheries. We demonstrate that: (a) The introduced policymaker is able to achieve comparable performance to the market equilibrium, showcasing the potential of such approaches in markets where the equilibrium prices can not be efficiently computed. (b) Our policymaker can notably outperform the equilibrium solution on certain metrics, while at the same time maintaining comparable performance for the remaining ones. (c) As a highlight of our findings, our policymaker is significantly more successful in maintaining resource sustainability, compared to the market outcome, in scarce resource environments.

翻译：我们提出一个切实可行的方法,通过一个深层强化学习型决策者来计算市场价格和分配额,在另一个学习型代理人的环境中运作。与理想化市场平衡结果相比(我们把这一结果作为基准),我们的决策者更灵活得多,使我们能够根据可持续性和资源浪费、公平、买方和卖方福利等不同目标调整价格。为了评估我们的方法,我们设计了一个现实的市场,由多种不同的买方和卖方组成。此外,卖方本身也是深层学习型代理人,在基于商业渔业生物经济模式的共同资源分配环境中竞争资源。我们证明:(a) 引进的决策者能够取得与市场平衡相类似的业绩,在无法有效计算平衡价格的市场中展示这种做法的潜力。 (b) 我们的决策者可以明显地超越某些计量指标的平衡解决办法,同时保持其余指标的可比业绩。 (c) 作为我们发现的一个亮点,我们的决策者在维持资源可持续性方面比市场结果要成功得多,在稀缺的资源环境中。