Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Laroca, Rayson; Estevam, Valter; Britto Jr., Alceu S.; Minetto, Rodrigo; Menotti, David

doi:10.1109/IJCNN54540.2023.10191584

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.04653 (cs)

[Submitted on 10 Apr 2023 (v1), last revised 4 Aug 2023 (this version, v2)]

Title:Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Authors:Rayson Laroca, Valter Estevam, Alceu S. Britto Jr., Rodrigo Minetto, David Menotti

View PDF

Abstract:This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at this https URL

Comments:	Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.04653 [cs.CV]
	(or arXiv:2304.04653v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.04653
Related DOI:	https://doi.org/10.1109/IJCNN54540.2023.10191584

Submission history

From: Rayson Laroca [view email]
[v1] Mon, 10 Apr 2023 15:24:29 UTC (2,023 KB)
[v2] Fri, 4 Aug 2023 13:36:29 UTC (2,024 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators