PD-MORL: 优先-驱动多目标强化学习比值 (PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm)

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. It also employs a novel parallelization approach to increase sample efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.

翻译：多个目标强化学习(MORL)方法已经出现,通过最大限度地提高一个由偏好矢量加权的共同目标功能,解决许多现实世界问题,实现多个相互冲突的目标。这些方法发现与培训期间指定的偏爱矢量相对应的固定定制政策。然而,设计限制和目标通常在现实情景中动态变化。此外,为每个潜在偏爱存储一项政策是不可扩展的。因此,为特定域的全部偏爱空间获得一套有单一培训的Pareto前方解决方案至关重要。为此,我们提出一个新的MORL算法,用于培训一个单一的通用网络,以覆盖整个可扩展至连续机器人任务的偏爱空间。拟议方法Preg-Driven MORL(PD-MORL)将偏好选项用作更新网络参数的指南。它还采用新的平行方法提高样本效率。我们显示,PD-MORL在挑战持续控制任务方面达到高达25%的超容量,并使用比以往方法少得多的可培训参数。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日