DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

Gu, Wenhao; Gu, Li; Wang, Ziqiang; Suen, Ching Yee; Wang, Yang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.12898 (cs)

[Submitted on 22 Jan 2025]

Title:DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

Authors:Wenhao Gu, Li Gu, Ziqiang Wang, Ching Yee Suen, Yang Wang

View PDF HTML (experimental)

Abstract:Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and varying document layouts remains a practical challenge. Moreover, this issue is seldom addressed in academic research, particularly in scenarios with minimal annotated data available. In this paper, we introduce the DocTTT framework to address these challenges. The key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing. We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder~(MAE). During testing, we adapt the visual representation parameters using a self-supervised MAE loss. During training, we learn the model parameters using a meta-learning framework, so that the model parameters are learned to adapt to a new input effectively. Experimental results show that our proposed method significantly outperforms existing state-of-the-art approaches on benchmark datasets.

Comments:	WACV2025, camera ready with updated reference
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.12898 [cs.CV]
	(or arXiv:2501.12898v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.12898

Submission history

From: Li Gu [view email]
[v1] Wed, 22 Jan 2025 14:18:47 UTC (7,236 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators