SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Tran, Bao Hieu; Le-Cong, Thanh; Nguyen, Huu Manh; Le, Duc Anh; Nguyen, Thanh Hung; Nguyen, Phi Le

doi:10.1109/ICMLA51294.2020.00223

Computer Science > Computer Vision and Pattern Recognition

arXiv:2201.00132 (cs)

[Submitted on 1 Jan 2022]

Title:SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Authors:Bao Hieu Tran, Thanh Le-Cong, Huu Manh Nguyen, Duc Anh Le, Thanh Hung Nguyen, Phi Le Nguyen

View PDF

Abstract:In the last decades, scene text recognition has gained worldwide attention from both the academic community and actual users due to its importance in a wide range of applications. Despite achievements in optical character recognition, scene text recognition remains challenging due to inherent problems such as distortions or irregular layout. Most of the existing approaches mainly leverage recurrence or convolution-based neural networks. However, while recurrent neural networks (RNNs) usually suffer from slow training speed due to sequential computation and encounter problems as vanishing gradient or bottleneck, CNN endures a trade-off between complexity and performance. In this paper, we introduce SAFL, a self-attention-based neural network model with the focal loss for scene text recognition, to overcome the limitation of the existing approaches. The use of focal loss instead of negative log-likelihood helps the model focus more on low-frequency samples training. Moreover, to deal with the distortions and irregular texts, we exploit Spatial TransformerNetwork (STN) to rectify text before passing to the recognition network. We perform experiments to compare the performance of the proposed model with seven benchmarks. The numerical results show that our model achieves the best performance.

Comments:	Accepted to ICMLA 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2201.00132 [cs.CV]
	(or arXiv:2201.00132v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2201.00132
Journal reference:	2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
Related DOI:	https://doi.org/10.1109/ICMLA51294.2020.00223

Submission history

From: Thanh Le-Cong Le-Cong Thanh [view email]
[v1] Sat, 1 Jan 2022 06:51:03 UTC (1,901 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators