Public datasets have played a key role in advancing the state of the art in License Plate Recognition (LPR). Although dataset bias has been recognized as a severe problem in the computer vision community, it has been largely overlooked in the LPR literature. LPR models are usually trained and evaluated separately on each dataset. In this scenario, they have often proven robust in the dataset they were trained in but showed limited performance in unseen ones. Therefore, this work investigates the dataset bias problem in the LPR context. We performed experiments on eight datasets, four collected in Brazil and four in mainland China, and observed that each dataset has a unique, identifiable "signature" since a lightweight classification model predicts the source dataset of a license plate (LP) image with more than 95% accuracy. In our discussion, we draw attention to the fact that most LPR models are probably exploiting such signatures to improve the results achieved in each dataset at the cost of losing generalization capability. These results emphasize the importance of evaluating LPR models in cross-dataset setups, as they provide a better indication of generalization (hence real-world performance) than within-dataset ones.
翻译:公共数据集在推进许可证板识别(LPR)方面起到了关键作用。虽然数据集偏差已被确认为计算机视觉界的一个严重问题,但在LPR文献中却在很大程度上被忽略了。LPR模型通常在每数据集上分别培训和评估。在这种假设中,它们往往在数据集中被证明是健全的,但在无形的数据集中表现有限。因此,这项工作调查了LPR背景下的数据集偏差问题。我们在8个数据集中进行了实验,4个在巴西收集,4个在中国大陆收集,发现每个数据集都有独特、可识别的“签名”,因为轻量级分类模型预测了牌照(LP)图像的源数据集,精确度超过95%。在我们的讨论中,我们提请注意,大多数LPR模型可能正在利用这种签名来改善每个数据集中取得的成果,成本是失去一般化能力。这些结果强调了在交叉数据集设置中评估LPR模型的重要性,因为它们提供了比内部数据集更好的一般化表现指标。