Reinforcement learning is important part of artificial intelligence. In this paper, we review model-free reinforcement learning that utilizes the average reward optimality criterion in the infinite horizon setting. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addition to the value-iteration and tabular counterparts). We present a comprehensive literature mapping. We also identify and discuss opportunities for future work.
翻译:强化学习是人工智能的重要组成部分。在本文件中,我们审查利用无限地平线环境中平均奖励最佳性标准的无模式强化学习。在马哈德万的单人调查(1996年a)的推动下,我们提供了该领域最新工作回顾,并扩大到涵盖政策范围及功能近似方法(除数值和表格对应方之外)。我们提出了一个全面的文献图。我们还查明并讨论未来工作的机会。