Domain shift across crowd data severely hinders crowd counting models to generalize to unseen scenarios. Although domain adaptive crowd counting approaches close this gap to a certain extent, they are still dependent on the target domain data to adapt (e.g. finetune) their models to the specific domain. In this paper, we aim to train a model based on a single source domain which can generalize well on any unseen domain. This falls into the realm of domain generalization that remains unexplored in crowd counting. We first introduce a dynamic sub-domain division scheme which divides the source domain into multiple sub-domains such that we can initiate a meta-learning framework for domain generalization. The sub-domain division is dynamically refined during the meta-learning. Next, in order to disentangle domain-invariant information from domain-specific information in image features, we design the domain-invariant and -specific crowd memory modules to re-encode image features. Two types of losses, i.e. feature reconstruction and orthogonal losses, are devised to enable this disentanglement. Extensive experiments on several standard crowd counting benchmarks i.e. SHA, SHB, QNRF, and NWPU, show the strong generalizability of our method.
翻译:领域移位严重阻碍了人群计数模型推广到未知场景。虽然基于领域自适应的人群计数方法在一定程度上缩小了这个差距,但它们仍然依赖于目标域数据来适应(比如微调)其模型到特定域。在本文中,我们旨在训练一个基于单个源域的模型,它在任何未知域中都能很好地泛化。这属于人群计数中尚未探索的领域通用化领域。我们首先介绍一个动态的子域划分方案,将源域分成多个子域,以便我们可以启动领域通用化的元学习框架。子域划分在元学习过程中动态优化。接下来,为了从图像特征中将领域不变信息和领域特定信息分离出来,我们设计了领域不变和特定的人群记忆模块来重新编码图像特征。设计了两种损失函数,即特征重构损失和正交损失,以实现这种分离。对几个标准的人群计数基准测试数据集(SHA、SHB、QNRF 和 NWPU)的大量实验表明了我们方法的强大泛化能力。