Trading markets represent a real-world financial application to deploy reinforcement learning agents, however, they carry hard fundamental challenges such as high variance and costly exploration. Moreover, markets are inherently a multiagent domain composed of many actors taking actions and changing the environment. To tackle these type of scenarios agents need to exhibit certain characteristics such as risk-awareness, robustness to perturbations and low learning variance. We take those as building blocks and propose a family of four algorithms. First, we contribute with two algorithms that use risk-averse objective functions and variance reduction techniques. Then, we augment the framework to multi-agent learning and assume an adversary which can take over and perturb the learning process. Our third and fourth algorithms perform well under this setting and balance theoretical guarantees with practical use. Additionally, we consider the multi-agent nature of the environment and our work is the first one extending empirical game theory analysis for multi-agent learning by considering risk-sensitive payoffs.
翻译:然而,贸易市场代表着部署强化学习代理物的真正世界金融应用,但它们包含着巨大的根本性挑战,如差异和代价高昂的探索。此外,市场本质上是一个由许多行为者采取行动和改变环境的多试剂领域。处理这类情景需要表现出某些特征,如风险意识、稳健性、扰动和低学习差异。我们把这些作为构件,提出四种算法的组合。首先,我们采用两种使用反风险客观功能和减少差异技术的算法。然后,我们将框架扩大至多试剂学习,并承担一个可以取代和干扰学习过程的对手。我们的第三和第四种算法在这一背景下运作良好,并在理论保证和实际使用之间取得平衡。此外,我们认为环境的多试剂性质和我们的工作是通过考虑风险敏感性的回报来扩大多试剂学习经验理论分析的第一种方法。