In this paper, a novel Unified Multi-Task Learning Framework of Real-Time Drone Supervision for Crowd Counting (MFCC) is proposed, which utilizes an image fusion network architecture to fuse images from the visible and thermal infrared image, and a crowd counting network architecture to estimate the density map. The purpose of our framework is to fuse two modalities, including visible and thermal infrared images captured by drones in real-time, that exploit the complementary information to accurately count the dense population and then automatically guide the flight of the drone to supervise the dense crowd. To this end, we propose the unified multi-task learning framework for crowd counting for the first time and re-design the unified training loss functions to align the image fusion network and crowd counting network. We also design the Assisted Learning Module (ALM) to fuse the density map feature to the image fusion encoder process for learning the counting features. To improve the accuracy, we propose the Extensive Context Extraction Module (ECEM) that is based on a dense connection architecture to encode multi-receptive-fields contextual information and apply the Multi-domain Attention Block (MAB) for concerning the head region in the drone view. Finally, we apply the prediction map to automatically guide the drones to supervise the dense crowd. The experimental results on the DroneRGBT dataset show that, compared with the existing methods, ours has comparable results on objective evaluations and an easier training process.
翻译:在本文中,提出了一个新的“实时计票无人机实时无人机监督(MFCC)统一多任务学习框架”,它利用图像聚合网络架构将可见和热红红外图像和人群计票网络架构结合到可见和热红外图像中,并使用人群计数网络架构来估计密度地图。我们框架的目的是结合两种模式,包括由无人驾驶飞机实时捕获的可见和热红红外图像,利用补充信息准确计算稠密人口,然后自动引导无人驾驶飞机的飞行以监督密集人群。为此,我们提出了用于首次人群计数的统一多任务学习框架,并重新设计统一培训损失功能以配合图像凝聚网络和人群计票网络的图像。我们还设计了辅助学习模块(ALM),将密度地图特征与图像聚合电解码进程结合起来,以便了解计数特征。为了提高准确性,我们提议基于密集连接结构的广度环境抽调多视场背景信息,并应用多视场关注损失统一培训功能的多任务学习框架,以调整图像聚合损失功能网络和人群计数网络。我们还设计了图像目标导算的磁带,对主机头进行了测试。