A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making. Importantly, while it is the artificial agent that learns and acts, it is still up to humans to specify the particular task to be performed. Classical task-specification approaches typically involve humans providing stationary reward functions or explicit demonstrations of the desired tasks. However, there has recently been a great deal of research energy invested in exploring alternative ways in which humans may guide learning agents that may, e.g., be more suitable for certain tasks or require less human effort. This survey provides a high-level overview of five recent machine learning frameworks that primarily rely on human guidance apart from pre-specified reward functions or conventional, step-by-step action demonstrations. We review the motivation, assumptions, and implementation of each framework, and we discuss possible future research directions.
翻译:人造情报的长期目标是创造能够学会执行需要先后决策的任务的人工代理人。 重要的是,虽然是人工代理人学习和采取行动,但人类仍有责任具体规定要完成的特定任务。经典任务区分方法通常涉及人提供固定的奖励功能或明确展示所期望的任务。然而,最近投入了大量研究精力,探索人类可以指导学习代理人的替代方法,例如,可能更适合某些任务或更不需要人的努力。这项调查对最近五个主要依赖人类指导的机器学习框架进行了高级别概述,这些框架除了事先规定的奖励功能或常规的、逐步的行动示范之外,还主要依赖人类指导。我们审查了每个框架的动机、假设和执行情况,并讨论了未来可能的研究方向。