We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. To our best acknowledge, CSRNet is the first implementation using dilated CNNs for crowd counting tasks. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part_B dataset, CSRNet significantly achieves 47.3% lower MAE than the previous state-of-the-art method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach.
翻译:我们提议建立一个名为CSRNet的Congest Scente Scente 识别网络网络,以提供一种数据驱动和深层次学习的方法,能够理解高度拥挤的场景,进行准确的计数估计,并提供高质量的密度地图。拟议的CSRNet由两个主要部分组成:作为2D特征提取前端的革命性神经网络(CNN)和后端的扩展型CNN,使用扩展式内核内核来提供更大的接收场和替代集合操作。CSRNet是一个容易培训的模式,因为它纯粹是卷土重来的结构。我们最能肯定的是,CSRNet是使用配对的CNNCD进行人群计数任务的第一个实施系统。我们在四个数据集上展示CSRNet(SHanghaiTech数据集、UCF_CC_50数据集、WorldEXPO'10数据集和UCSD数据集),我们提供最新性能。在上海科技部分B数据集中,CSRNet的应用大大降低了47.3%的MAE值,比先前的MAE值低值,我们通过前州-ST-SR IMS 目标输出系统展示了先前的MASR 15R 数据。