Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects' maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.
翻译:强化学习(RL)越来越被广泛应用于许多领域中,包括大规模和安全关键系统,例如自动驾驶。随着RL库的出现,它的适用性进一步增强,使用户可以集成RL算法。然而,我们注意到大部分这样的代码并非由RL工程师开发,这可能会导致软件质量差,出现错误,性能不佳,可维护性和RL项目的演变问题。在本文中,我们开始探讨这一假设,特定于利用RL的代码,分析发现的不同项目,从软件工程的角度评估它们的质量。我们的研究包括24个流行的基于Python的RL项目,使用标准软件工程度量进行分析。我们的结果,与针对ML代码的类似分析相一致,显示受欢迎且广泛重复使用的RL代码库包含许多代码异味(平均3.95%的代码基础),严重影响项目的可维护性。检测到的最常见的代码异味是长方法和长方法链,突显了代理的定义和交互存在问题。检测到的代码异味表明责任分离存在问题,并且当前的抽象是否适合于定义RL算法。