Reinforcement learning (RL) has gained increasing attraction in the academia and tech industry with launches to a variety of impactful applications and products. Although research is being actively conducted on many fronts (e.g., offline RL, performance, etc.), many RL practitioners face a challenge that has been largely ignored: determine whether a designed Markov Decision Process (MDP) is valid and meaningful. This study proposes a heuristic-based feature analysis method to validate whether an MDP is well formulated. We believe an MDP suitable for applying RL should contain a set of state features that are both sensitive to actions and predictive in rewards. We tested our method in constructed environments showing that our approach can identify certain invalid environment formulations. As far as we know, performing validity analysis for RL problem formulation is a novel direction. We envision that our tool will serve as a motivational example to help practitioners apply RL in real-world problems more easily.
翻译:强化学习(RL)在学术界和技术行业的吸引力越来越大,向各种具有影响的应用和产品推出。虽然正在许多方面积极开展研究(例如脱线RL、性能等),但许多RL从业人员面临一个基本上被忽视的挑战:确定设计好的Markov决策程序是否有效和有意义。本研究报告提出了基于黑奴主义的特征分析方法,以验证MDP是否设计得当。我们认为适合应用RL的MDP应该包含一系列既敏感于行动又能预测回报的状态特征。我们在建筑环境中测试了我们的方法,表明我们的方法可以确定某些无效的环境配方。据我们所知,对ROL问题配方进行有效性分析是一个新方向。我们设想,我们的工具将作为一个激励性范例,帮助从业者在现实世界问题中更容易应用RL。