AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Yamazaki, Kashu; Hanyu, Taisei; Tran, Minh; de Luis, Adrian; McCann, Roy; Liao, Haitao; Rainwater, Chase; Adkins, Meredith; Cothren, Jackson; Le, Ngan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.06842 (cs)

[Submitted on 12 Jun 2023 (v1), last revised 1 Oct 2023 (this version, v2)]

Title:AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Authors:Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le

View PDF

Abstract:Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects. To handle these problems, we inherit the advantages of Transformers and propose AerialFormer, which unifies Transformers at the contracting path with lightweight Multi-Dilated Convolutional Neural Networks (MD-CNNs) at the expanding path. Our AerialFormer is designed as a hierarchical structure, in which Transformer encoder outputs multi-scale features and MD-CNNs decoder aggregates information from the multi-scales. Thus, it takes both local and global contexts into consideration to render powerful representations and high-resolution segmentation. We have benchmarked AerialFormer on three common datasets including iSAID, LoveDA, and Potsdam. Comprehensive experiments and extensive ablation studies show that our proposed AerialFormer outperforms previous state-of-the-art methods with remarkable performance. Our source code will be publicly available upon acceptance.

Comments:	under review
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2306.06842 [cs.CV]
	(or arXiv:2306.06842v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.06842

Submission history

From: Kashu Yamazaki [view email]
[v1] Mon, 12 Jun 2023 03:28:18 UTC (47,729 KB)
[v2] Sun, 1 Oct 2023 17:04:35 UTC (41,673 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators