HYAR: 通过混合行动代表解决不协调的持续行动强化学习问题 (HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation)

Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI. However, most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space, while seldom take into account the hybrid action space. One naive way to address hybrid action RL is to convert the hybrid action space into a unified homogeneous action space by discretization or continualization, so that conventional RL algorithms can be applied. However, this ignores the underlying structure of hybrid action space and also induces the scalability issue and additional approximation difficulties, thus leading to degenerated results. In this paper, we propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space. HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE). To further improve the effectiveness, the action representation is trained to be semantically smooth through unsupervised environmental dynamics prediction. Finally, the agent then learns its policy with conventional DRL algorithms in the learned representation space and interacts with the environment by decoding the hybrid action embeddings to the original action space. We evaluate HyAR in a variety of environments with discrete-continuous action space. The results demonstrate the superiority of HyAR when compared with previous baselines, especially for high-dimensional action spaces.

翻译：分解的混合行动空间是许多实际问题的自然环境,例如机器人控制和游戏AI。然而,多数先前的加强学习(RL)工作只是展示了在控制离散或连续行动空间方面取得成功,而很少考虑到混合行动空间。处理混合行动RL的一个天真的方法是通过离散或连续化将混合行动空间转化为统一的同质行动空间,从而可以应用常规的变异自动-Encoder(VAE),但是,这忽视了混合行动空间的基本结构,也引起了可变性问题和更多的近似困难,从而导致结果退化。在本文件中,我们提议混合行动代表(HyAR)学习原始混合行动空间的紧凑和可变异的潜在代表空间。HyAR构建了潜在空间,并通过嵌入表和有条件的变异性自动-Encoder(VAE)将离散行动空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间代表器与原始行动高级动作进行互动。最后,代理者随后与传统空间代表动作进行我们所学的模拟空间动作的模拟互动。