This work addresses the problem of vehicle identification through non-overlapping cameras. As our main contribution, we introduce a novel dataset for vehicle identification, called Vehicle-Rear, that contains more than three hours of high-resolution videos, with accurate information about the make, model, color and year of nearly 3,000 vehicles, in addition to the position and identification of their license plates. To explore our dataset we design a two-stream CNN that simultaneously uses two of the most distinctive and persistent features available: the vehicle's appearance and its license plate. This is an attempt to tackle a major problem: false alarms caused by vehicles with similar designs or by very close license plate identifiers. In the first network stream, shape similarities are identified by a Siamese CNN that uses a pair of low-resolution vehicle patches recorded by two different cameras. In the second stream, we use a CNN for OCR to extract textual information, confidence scores, and string similarities from a pair of high-resolution license plate patches. Then, features from both streams are merged by a sequence of fully connected layers for decision. In our experiments, we compared the two-stream network against several well-known CNN architectures using single or multiple vehicle features. The architectures, trained models, and dataset are publicly available at https://github.com/icarofua/vehicle-rear.
翻译:这项工作通过非重叠相机解决车辆识别问题。作为我们的主要贡献,我们推出车辆识别新数据集,称为车辆识别新数据集,该数据集包含超过三个小时的高清晰度视频,除了其牌照的定位和标识外,还有近3 000部车辆的制造、模型、颜色和年份的准确信息;为了探索我们的数据集,我们设计了一个双流CNN,同时使用两种现有最独特和最持久的特征:车辆外观及其牌照。这是试图解决一个重大问题:由设计相似的车辆或非常接近牌照识别码的车辆造成的虚假警报。在第一个网络流中,一个使用两个不同相机记录的低清晰度车辆补板的Siames CNN确定了相似性。在第二流中,我们使用CNN为OCR提取文本信息、信任分数和高清晰度牌照的一对一对一对相的相似性。然后,两种流的特征都由完全相连的层进行整合,以便作出决定。在我们的实验中,我们将两个流网络网络与几个广为人知的、经过培训的MAR/http网站结构进行对比。