Recently, as the demand for cleaning robots has steadily increased, therefore household electricity consumption is also increasing. To solve this electricity consumption issue, the problem of efficient path planning for cleaning robot has become important and many studies have been conducted. However, most of them are about moving along a simple path segment, not about the whole path to clean all places. As the emerging deep learning technique, reinforcement learning (RL) has been adopted for cleaning robot. However, the models for RL operate only in a specific cleaning environment, not the various cleaning environment. The problem is that the models have to retrain whenever the cleaning environment changes. To solve this problem, the proximal policy optimization (PPO) algorithm is combined with an efficient path planning that operates in various cleaning environments, using transfer learning (TL), detection nearest cleaned tile, reward shaping, and making elite set methods. The proposed method is validated with an ablation study and comparison with conventional methods such as random and zigzag. The experimental results demonstrate that the proposed method achieves improved training performance and increased convergence speed over the original PPO. And it also demonstrates that this proposed method is better performance than conventional methods (random, zigzag).
翻译:最近,随着清洁机器人的需求稳步增加,家庭用电量也在增加。为了解决这一电力消耗问题,清洁机器人的有效道路规划问题已经变得重要,而且已经进行了许多研究。然而,大多数都是在简单的路径段上前进,而不是清理所有地方的整个路径。随着正在形成的深层学习技术,清洁机器人采用了强化学习(RL)方法。然而,清洁机器人的模式只在特定的清洁环境中运行,而不是在各种清洁环境中运行。问题是,当清洁环境发生变化时,模型必须重新培训。要解决这个问题,准产品政策优化算法与有效路径规划相结合,在各种清洁环境中运作,使用转移学习(TL),探测最近的清洁瓷砖,奖励制成,以及制定精英定型方法。拟议的方法经过一个通缩研究和与随机和兹格扎格等传统方法的比较而得到验证。实验结果表明,拟议的方法提高了培训绩效,提高了原PPPPO的趋同速度。它还表明,拟议的方法比常规方法(randomrag, ziggrag)。