Autonomous agents that rely purely on perception to make real-time control decisions require efficient and robust architectures. In this work, we demonstrate that augmenting RGB input with depth information significantly enhances our agents' ability to predict steering commands compared to using RGB alone. We benchmark lightweight recurrent controllers that leverage the fused RGB-D features for sequential decision-making. To train our models, we collect high-quality data using a small-scale autonomous car controlled by an expert driver via a physical steering wheel, capturing varying levels of steering difficulty. Our models were successfully deployed on real hardware and inherently avoided dynamic and static obstacles, under out-of-distribution conditions. Specifically, our findings reveal that the early fusion of depth data results in a highly robust controller, which remains effective even with frame drops and increased noise levels, without compromising the network's focus on the task.
翻译:完全依赖感知做出实时控制决策的自主智能体需要高效且鲁棒的架构。本研究表明,与仅使用RGB输入相比,通过深度信息增强RGB输入能显著提升智能体预测转向指令的能力。我们基于融合的RGB-D特征,对用于序列决策的轻量级循环控制器进行了基准测试。为训练模型,我们通过专家驾驶员使用物理方向盘操控小型自动驾驶汽车,采集了包含不同转向难度级别的高质量数据。我们的模型已成功部署于真实硬件,并在分布外条件下,能够有效规避动态与静态障碍物。具体而言,研究发现深度数据的早期融合可产生高度鲁棒的控制器,即使在帧率下降和噪声水平增加的情况下仍能保持有效性,且不会削弱网络对任务的专注度。