A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.
翻译:值得信赖的强化学习算法应有能力解决具有挑战性的现实问题,包括处理不确定因素、满足[安全]制约以避免灾难性失败和在部署期间将[普及]推广到不可见的情景。本研究旨在概述值得信赖的强化学习的主要观点,考虑到其在稳健性、安全和可概括性方面的内在脆弱性。特别是,我们提出严格的提法,对相应的方法进行分类,并讨论每个观点的基准。此外,我们提供了一个展望部分,以激励有希望的未来方向,简要讨论考虑到人类反馈的极端脆弱性。我们希望这一调查能够将不同的研究线索汇集到一个统一的框架中,促进强化学习的可信度。