部署有人类反馈的离线强化离线学习</s> (Deploying Offline Reinforcement Learning with Human Feedback)

Reinforcement learning (RL) has shown promise for decision-making tasks in real-world applications. One practical framework involves training parameterized policy models from an offline dataset and subsequently deploying them in an online environment. However, this approach can be risky since the offline training may not be perfect, leading to poor performance of the RL models that may take dangerous actions. To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase. We formalize this online deployment problem and develop two approaches. The first approach uses model selection and the upper confidence bound algorithm to adaptively select a model to deploy from a candidate set of trained offline RL models. The second approach involves fine-tuning the model in the online deployment phase when a supervision signal arrives. We demonstrate the effectiveness of these approaches for robot locomotion control and traffic light control tasks through empirical validation.

翻译：强化学习(RL)显示现实应用中决策任务的前景。一个实用框架涉及从离线数据集中培训参数化政策模型,随后在在线环境中部署这些模型。然而,这种方法可能风险很大,因为离线培训可能不完美,导致可能采取危险行动的RL模型性能不佳。为解决这一问题,我们提议了一个替代框架,其中涉及人力监督RL模型,并在在线部署阶段提供额外反馈。我们正式确定这一在线部署问题,并开发了两种方法。第一种方法利用模式选择和高度信任约束算法,从经过培训的离线模型中适应性地选择一个模型,以便在监督信号到达时对在线部署阶段的模式进行微调。我们通过经验验证,展示了机器人移动控制和交通灯控制任务这些方法的有效性。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

148页最新《深度强化学习》教程，148页ppt

专知会员服务

77+阅读 · 2023年4月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日