Drone-to-drone detection using visual feed has crucial applications like avoiding collision with other drones/airborne objects, tackling a drone attack or coordinating flight with other drones. However, the existing methods are computationally costly, follow a non-end-to-end optimization and have complex multi-stage pipeline, which make them less suitable to deploy on edge devices for real-time drone flight. In this work, we propose a simple-yet-effective framework TransVisDrone, which provides end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to learn the spatio-temporal dependencies of drone motion which improves drone detection in challenging scenarios. Our method obtains state-of-the-art performance on three challenging real-world datasets (Average Precision@0.5IOU): NPS 0.95, FLDrones 0.75 and AOT 0.80. Apart from its superior performance, it achieves higher throughput than the prior work. We also demonstrate its deployment capability on edge-computing devices and usefulness in applications like drone-collision (encounter) detection. Code: \url{https://github.com/tusharsangam/TransVisDrone}.
翻译:利用视觉饲料探测的无人机至地铁探测系统具有重要应用,例如避免与其他无人机/空载物体碰撞、应对无人机攻击或与其他无人机协调飞行等。然而,现有方法在计算上成本高昂,遵循非端至端优化,并拥有复杂的多阶段管道,使其不适于在边缘装置上部署实时无人机飞行的遥控装置。在这项工作中,我们提议了一个简单而有效的框架TransVisDrone Trans-y-yt-yt-e-effective框架,它提供了终端至端解决方案,提高了计算效率。我们利用CSPDarkNet-53网络学习与物体有关的空间特征和视频Swin模型学习无人机运动的时空依赖性,以在具有挑战性的情况下改进对无人机的探测。我们的方法在三个具有挑战性的真实世界数据集(Average Precision@0.5IOU):核动力源0.95、FLDrones 0.75和AOT 0.80。除了其高级性能外,它还比先前的工作高。我们还展示其在边缘-computan-Disangan/comanga)设备上的部署能力。我们还展示了它在/Traction/Tractionscolm/transcolm的探测/transcolm/toms/tom/tomuscolt/tomuscomber的用途。