视觉计数深深学习技术 (Deep Learning Techniques for Visual Counting)

from arxiv, Version with high-quality images can be found at https://etd.adm.unipi.it/theses/available/etd-04262022-163702/. arXiv admin note: text overlap with arXiv:1802.03601, arXiv:1707.01202, arXiv:1809.02165, arXiv:1901.06026, arXiv:1808.01244 by other authors

In this thesis, I investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several CNN-based solutions have been suggested by the scientific community. These artificial neural networks provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, such as different illuminations and object scales. But apart from these difficulties, I targeted some other crucial limitations in the adoption of CNNs, proposing solutions that I experimentally evaluated in the context of the counting task which turns out to be particularly affected by these shortcomings. In particular, I tackled the problem related to the lack of data needed for training current CNN-based solutions. Given that the budget for labeling is limited, data scarcity still represents an open problem, particularly evident in tasks such as the counting one, where the objects to be labeled are thousands per image. Specifically, I introduced synthetic datasets gathered from virtual environments, where the training labels are automatically collected. I proposed Domain Adaptation strategies aiming at mitigating the domain gap existing between the training and test data distributions. I presented a counting strategy where I took advantage of the redundant information characterizing datasets labeled by multiple annotators. Moreover, I tackled the engineering challenges coming out of the adoption of CNN techniques in environments with limited power resources. I introduced solutions for counting vehicles directly onboard embedded vision systems. Finally, I designed an embedded modular Computer Vision-based system that can carry out several tasks to help monitor individual and collective human safety rules.

翻译：在本论文中,我调查并强化了视觉计数任务,即自动估计仍然在图像或视频框架中的物体数量。最近,由于科学界对计算任务的兴趣日益浓厚,有线电视新闻网提出了若干基于CNN的解决方案。这些人工神经网络为从原始视觉数据中自动学习有效表达方式提供了一条途径,并且可以成功地用于应对这一任务典型挑战,如不同介质和物体比例等。但是除了这些困难之外,我在采用CNN时还针对了其他一些关键限制,提出了在计算任务中我实验评估的内嵌式解决方案。特别是,由于对这项工作的兴趣日益浓厚,科学界提出了若干基于CNN的解决方案。特别是,我解决了与当前CNN解决方案缺乏数据解决方案所需数据有关的问题。鉴于标签预算有限,数据短缺仍是一个开放的问题,特别是在计数一等任务中,每幅标定的物体是千张图像。具体而言,我引入了从虚拟环境中收集的合成数据集,培训标签是自动收集的。我提议了Demime适应战略,旨在减少培训和测试系统之间现有域网格差距。我所设计的系统,通过多层次数据配置的策略,我还利用了最后标签,我掌握了数字数据分布。我如何计算。我如何计算,我如何计算,我,我如何计算出一个冗值,我,我如何计算,我如何计算,我如何计算,我如何计算,我如何计算,我如何计算,我如何使用。我,我,我,我,我如何计算,我如何计算出如何计算出如何计算出一个安全。