论文标题
在识别车牌识别中的数据集偏见
A First Look at Dataset Bias in License Plate Recognition
论文作者
论文摘要
公共数据集在车牌识别(LPR)中推进最新技术方面发挥了关键作用。尽管数据集偏见在计算机视觉社区中被认为是一个严重的问题,但在LPR文献中,它在很大程度上被忽略了。 LPR模型通常在每个数据集上分别训练和评估。在这种情况下,他们经常在接受培训的数据集中证明了强大的证明,但表现出了看不见的表现有限。因此,这项工作研究了LPR上下文中的数据集偏差问题。我们在八个数据集上进行了实验,其中四个在巴西收集,在中国大陆进行了实验,并观察到每个数据集都有一个独特的,可识别的“签名”,因为轻量级分类模型预测了牌照板(LP)图像的源数据集,其精度超过95%。在我们的讨论中,我们提请人们注意以下事实:大多数LPR模型可能正在利用此类签名以以失去概括能力为代价,以改善每个数据集中的结果。这些结果强调了评估跨数据库设置中LPR模型的重要性,因为它们提供了比数据库内部的更好的概括(因此实际性能)。
Public datasets have played a key role in advancing the state of the art in License Plate Recognition (LPR). Although dataset bias has been recognized as a severe problem in the computer vision community, it has been largely overlooked in the LPR literature. LPR models are usually trained and evaluated separately on each dataset. In this scenario, they have often proven robust in the dataset they were trained in but showed limited performance in unseen ones. Therefore, this work investigates the dataset bias problem in the LPR context. We performed experiments on eight datasets, four collected in Brazil and four in mainland China, and observed that each dataset has a unique, identifiable "signature" since a lightweight classification model predicts the source dataset of a license plate (LP) image with more than 95% accuracy. In our discussion, we draw attention to the fact that most LPR models are probably exploiting such signatures to improve the results achieved in each dataset at the cost of losing generalization capability. These results emphasize the importance of evaluating LPR models in cross-dataset setups, as they provide a better indication of generalization (hence real-world performance) than within-dataset ones.