Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.


翻译:无人机自主导航面临的关键挑战源于机载计算资源有限,这导致部署的深度神经网络只能采用浅层架构,无法处理复杂环境。将任务卸载至远程边缘服务器会引入高延迟,从而在系统设计中形成固有的权衡。为突破这些限制,我们提出CoDrone——首个将基础模型融入自主无人机巡航场景的云-边-端协同计算框架,有效利用基础模型提升资源受限无人机平台的性能。为降低机载计算与数据传输开销,CoDrone采用灰度图像作为导航模型的输入。当需要增强环境感知能力时,CoDrone借助边缘辅助的基础模型Depth Anything V2进行深度估计,并提出一种基于一维占据栅格的新型导航方法,在实现细粒度场景理解的同时,提升了自主导航的效率和表征简洁性。CoDrone的核心组件是一个基于深度强化学习的神经调度器,它能无缝整合深度估计与自主导航决策,实现对动态环境的实时适应。此外,该框架引入了面向无人机定制的视觉语言交互模块,通过融合领域特定的底层飞行基元,实现云端基础模型与无人机的有效交互。视觉语言模型的引入增强了在复杂未知场景中的开放集推理能力。实验结果表明,在不同飞行速度与网络条件下,CoDrone均优于基线方法,平均飞行距离提升40%,平均导航质量提高5%。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员