Recently, crowd density estimation has received increasing attention. The main challenge for this task is to achieve high-quality manual annotations on a large amount of training data. To avoid reliance on such annotations, previous works apply unsupervised domain adaptation (UDA) techniques by transferring knowledge learned from easily accessible synthetic data to real-world datasets. However, current state-of-the-art methods either rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. In this work, we aim to develop a new adversarial learning based method, which is simple and efficient to apply. To reduce the domain gap between the synthetic and real data, we design a bi-level alignment framework (BLA) consisting of (1) task-driven data alignment and (2) fine-grained feature alignment. In contrast to previous domain augmentation methods, we introduce AutoML to search for an optimal transform on source, which well serves for the downstream task. On the other hand, we do fine-grained alignment for foreground and background separately to alleviate the alignment difficulty. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin. Also, our approach is simple, easy to implement and efficient to apply. The code is publicly available at https://github.com/Yankeegsj/BLA.
翻译:最近,人群密度估计受到越来越多的关注。 这项任务的主要挑战是在大量培训数据上实现高质量的人工说明。 为避免依赖这种说明,以往的工作采用未经监督的域适应技术,将容易获得的合成数据所学知识转让给真实世界数据集,然而,目前最先进的方法要么依靠外部数据来培训辅助任务,要么采用昂贵的粗皮到软皮估计。在这项工作中,我们的目标是开发一种基于新对抗性学习方法,这种方法简单而有效,可以应用。为缩小合成数据与真实数据之间的域间差距,我们设计了一个双级调整框架,包括:(1)任务驱动的数据对齐和(2)细细细细的特性对齐。与以往的域增强方法相比,我们采用“自动ML”来寻找源的最佳变换,这很好地对下游任务有用。另一方面,我们为减轻调整前地和背景分别进行精细的对准。我们评估了五个真实世界群数基准的方法,我们在其中以任务驱动的数据对齐了双级对齐,我们用了一个简单的方法。