Human-centered AI considers human experiences with AI performance. While abundant research has been helping AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum feedback. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications that require millions of samples without a server. The result demonstrates the effectiveness of an interactive curriculum for reinforcement learning involving human-in-the-loop. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level. We believe this research will open new doors for achieving flow and personalized adaptive difficulties.
翻译:以人类为中心的大赦国际将人的经验与AI的性能结合起来。 大量研究帮助AI通过完全自动或薄弱的监督学习取得了超人的业绩,但是,由于投入精细,AI如何根据人类偏好的技能水平进行调整以适应人类偏好的技能水平的实验却较少。 在这项工作中,我们引导课程强化学习结果达到一个更可取的性能水平,通过学习人类的决策过程,这种水平既不太难,也不过于容易。为了实现这一目标,我们开发了一个便携式互动平台,使用户能够通过操纵任务难度、观察业绩和提供课程反馈,在网上与代理互动。我们的系统非常平行,使得人类能够培训大规模强化学习应用,需要数百万个样本,而没有服务器。结果显示了加强涉及人类的学习的互动课程的有效性。 它表明,强化学习业绩能够成功地与人类所期望的困难水平同步。 我们相信,这一研究将为实现流动和个人化适应困难打开新的大门。