In this dissertation, we investigated and enhanced Deep Learning (DL) techniques for counting objects, like pedestrians, cells or vehicles, in still images or video frames. In particular, we tackled the challenge related to the lack of data needed for training current DL-based solutions. Given that the budget for labeling is limited, data scarcity still represents an open problem that prevents the scalability of existing solutions based on the supervised learning of neural networks and that is responsible for a significant drop in performance at inference time when new scenarios are presented to these algorithms. We introduced solutions addressing this issue from several complementary sides, collecting datasets gathered from virtual environments automatically labeled, proposing Domain Adaptation strategies aiming at mitigating the domain gap existing between the training and test data distributions, and presenting a counting strategy in a weakly labeled data scenario, i.e., in the presence of non-negligible disagreement between multiple annotators. Moreover, we tackled the non-trivial engineering challenges coming out of the adoption of Convolutional Neural Network-based techniques in environments with limited power resources, introducing solutions for counting vehicles and pedestrians directly onboard embedded vision systems, i.e., devices equipped with constrained computational capabilities that can capture images and elaborate them.
翻译:在这项论文中,我们调查并强化了在静止图像或视频框中计数行人、细胞或车辆等物体的深学习技术(DL),在静态图像或视频框中计数对象,特别是,我们应对了与缺乏培训当前基于DL的解决方案所需的数据有关的挑战。鉴于标签预算有限,数据稀缺仍是一个开放的问题,它阻碍了基于神经网络监督学习的现有解决方案的可缩放性,造成在向这些算法介绍新的假想时,在推论时间的性能显著下降。我们从几个互补的方面引入了解决这一问题的解决方案,收集了从自动标有标签的虚拟环境中收集的数据集,提出了旨在缩小培训和测试数据分布之间现有领域差距的域适应战略,并提出了在标签薄弱的数据假设中进行计数的战略,即,在多个警告者之间存在非明显的分歧。此外,我们应对了在有限的电力资源环境中采用基于进动神经网络的技术所产生的非技术性工程性的挑战,提出了旨在减少车辆和行车不动式图像的计算方法,引入了在固定的定位装置上进行精确的图像计算的方法。