Recent years have witnessed a substantial increase in the deep learning (DL)architectures proposed for visual recognition tasks like person re-identification,where individuals must be recognized over multiple distributed cameras. Althoughthese architectures have greatly improved the state-of-the-art accuracy, thecomputational complexity of the CNNs commonly used for feature extractionremains an issue, hindering their deployment on platforms with limited resources,or in applications with real-time constraints. There is an obvious advantage toaccelerating and compressing DL models without significantly decreasing theiraccuracy. However, the source (pruning) domain differs from operational (target)domains, and the domain shift between image data captured with differentnon-overlapping camera viewpoints leads to lower recognition accuracy. In thispaper, we investigate the prunability of these architectures under different designscenarios. This paper first revisits pruning techniques that are suitable forreducing the computational complexity of deep CNN networks applied to personre-identification. Then, these techniques are analysed according to their pruningcriteria and strategy, and according to different scenarios for exploiting pruningmethods to fine-tuning networks to target domains. Experimental resultsobtained using DL models with ResNet feature extractors, and multiplebenchmarks re-identification datasets, indicate that pruning can considerablyreduce network complexity while maintaining a high level of accuracy. Inscenarios where pruning is performed with large pre-training or fine-tuningdatasets, the number of FLOPS required by ResNet architectures is reduced byhalf, while maintaining a comparable rank-1 accuracy (within 1% of the originalmodel). Pruning while training a larger CNNs can also provide a significantlybetter performance than fine-tuning smaller ones.
翻译:近些年来,为个人重新定位等视觉识别任务提议的深层学习(DL)结构大幅提高, 个人必须在多个分布式相机上得到承认。 虽然这些结构大大改善了功能提取中常用的CNN系统的最新准确性, 阻碍在资源有限的平台上或实时限制的应用中部署这些系统。 加速和压缩 DL 模型而不会显著降低其准确性。 然而, 源( 正在运行) 域不同于操作( 目标) 数据库, 以不同非重叠相机视图获取的图像数据之间的域变导致较低的识别准确性。 在此纸上, 我们调查这些结构在不同的设计性能下是否具有可操作性, 妨碍在个人再识别中应用的深度CNN网络的计算复杂性。 然后, 这些技术可以根据其运行标准和战略进行分析, 并且根据不同的情形, 利用不同非重叠相机的图像库的精确性能进行精确性变现, 并且正在大量地进行实验性化的网络, 并且正在大量地进行精确性变压。