Test-Time Adaptation for Visual Document Understanding

Ebrahimi, Sayna; Arik, Sercan O.; Pfister, Tomas

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.07240 (cs)

[Submitted on 15 Jun 2022 (v1), last revised 23 Aug 2023 (this version, v2)]

Title:Test-Time Adaptation for Visual Document Understanding

Authors:Sayna Ebrahimi, Sercan O. Arik, Tomas Pfister

View PDF

Abstract:For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{this https URL}.

Comments:	Accepted at TMLR 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2206.07240 [cs.CV]
	(or arXiv:2206.07240v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.07240

Submission history

From: Sayna Ebrahimi [view email]
[v1] Wed, 15 Jun 2022 01:57:12 UTC (2,715 KB)
[v2] Wed, 23 Aug 2023 22:54:40 UTC (9,959 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Test-Time Adaptation for Visual Document Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Test-Time Adaptation for Visual Document Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators