多机构强化学习中新出现的易竞行为 (Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning)

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

翻译：人工智能的进步往往源于新环境的发展,这些新环境将现实世界的表面情况抽象地发展成可以方便地进行研究的形式。本文件以初级微观经济学启发的理念为基础,为这种环境提供了一种环境。代理人学会在一个空间复杂的世界里生产资源,相互交易,消费他们喜欢的资源。我们显示,新兴的生产、消费和定价行为在微观经济学的供需变化所预测的方向上对环境条件作出反应。我们还展示了代理商的商品新价格因空间而异的环境,反映了当地货物的丰度。在价格差异出现后,一些代理人在以不同现行价格的地区之间发现了运输货物的一席之地 -- -- 这是一种有利可图的战略,因为他们可以在廉价的地方购买货物,相互交易,并消费他们喜欢的资源。最后,在一系列通缩实验中,我们调查环境奖励、易货行动、代理结构以及消费可交易商品的能力的选择如何帮助或抑制这种经济行为的出现模式。这项工作是环境发展分支的一部分,其目的是通过模拟社会上的各种试金质互动,从一种前期研究环境中建立像人造的微观一般情报。我们从一个模型学习了一种不同的环境,从模拟经济学,从一个模型学习了一种不同的研究环境,从一个模型到一种研究,从一个模型学习了一种研究过程。学习了一种研究环境。