学习寻找:在纳米无人机微控制器上与深强化学习一起寻找自主来源 (Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller)

We present fully autonomous source seeking onboard a highly constrained nano quadcopter, by contributing application-specific system and observation feature design to enable inference of a deep-RL policy onboard a nano quadcopter. Our deep-RL algorithm finds a high-performance solution to a challenging problem, even in presence of high noise levels and generalizes across real and simulation environments with different obstacle configurations. We verify our approach with simulation and in-field testing on a Bitcraze CrazyFlie using only the cheap and ubiquitous Cortex-M4 microcontroller unit. The results show that by end-to-end application-specific system design, our contribution consumes almost three times less additional power, as compared to competing learning-based navigation approach onboard a nano quadcopter. Thanks to our observation space, which we carefully design within the resource constraints, our solution achieves a 94% success rate in cluttered and randomized test environments, as compared to the previously achieved 80%. We also compare our strategy to a simple finite state machine (FSM), geared towards efficient exploration, and demonstrate that our policy is more robust and resilient at obstacle avoidance as well as up to 70% more efficient in source seeking. To this end, we contribute a cheap and lightweight end-to-end tiny robot learning (tinyRL) solution, running onboard a nano quadcopter, that proves to be robust and efficient in a challenging task using limited sensory input.

翻译：我们提出完全自主的源代码,在高度受限的纳米四氯杀螨器上寻找一个高度受限的纳米四氯杀螨器,方法是提供具体应用的系统和观测功能设计,以便能够在纳米四氯杀螨器上推断深RL政策。我们的深RL算法发现一个挑战性问题的高性能解决方案,即便在高噪声水平下,并且以不同障碍配置的形式在真实的和模拟环境中加以推广。我们核查我们的方法,即仅使用廉价和无处不在的Cortexex-M4微控制器,在比特拉卡兹疯狂的测试环境中进行模拟和现场测试,仅使用廉价和无处不在的Cortex-M4微控制器。结果显示,通过终端对终端对终端应用特定系统的设计,我们的贡献消耗了近三倍的额外动力,而相对于在纳米四氯杀螨机机上相互竞争的基于学习的导航方法。由于我们的观测空间,我们在资源制约下仔细设计,我们的解决方案在混杂和随机的测试环境中取得了94%的成功率,而以前实现了80%的测试环境。我们还将我们的战略与一个简单的定型固定式的国家机器(FSMSM)相比,目的是要适应高效的探索,在高效的探索中,在快速的探索中,并显示一个更强大的、更强大的、更快速的路径,并显示我们的政策在最终的学习一个更强大和更具障碍。