Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Zhang, Ziyang; An, Liuwei; Cui, Zishun; xu, Ao; Dong, Tengteng; Jiang, Yueqi; Shi, Jingyi; Liu, Xin; Sun, Xiao; Wang, Meng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.09158 (cs)

[Submitted on 16 Mar 2023 (v1), last revised 20 Mar 2023 (this version, v2)]

Title:Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Authors:Ziyang Zhang, Liuwei An, Zishun Cui, Ao xu, Tengteng Dong, Yueqi Jiang, Jingyi Shi, Xin Liu, Xiao Sun, Meng Wang

View PDF

Abstract:In this paper, we present our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes four sub-challenges of Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection and Emotional Reaction Intensity (ERI) Estimation. The 5th ABAW competition focuses on facial affect recognition utilizing different modalities and datasets. In our work, we extract powerful audio and visual features using a large number of sota models. These features are fused by Transformer Encoder and TEMMA. Besides, to avoid the possible impact of large dimensional differences between various features, we design an Affine Module to align different features to the same dimension. Extensive experiments demonstrate that the superiority of the proposed method. For the VA Estimation sub-challenge, our method obtains the mean Concordance Correlation Coefficient (CCC) of 0.6066. For the Expression Classification sub-challenge, the average F1 Score is 0.4055. For the AU Detection sub-challenge, the average F1 Score is 0.5296. For the Emotional Reaction Intensity Estimation sub-challenge, the average pearson's correlations coefficient on the validation set is 0.3968. All of the results of four sub-challenges outperform the baseline with a large margin.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2303.09158 [cs.CV]
	(or arXiv:2303.09158v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.09158

Submission history

From: Ziyang Zhang [view email]
[v1] Thu, 16 Mar 2023 08:47:36 UTC (176 KB)
[v2] Mon, 20 Mar 2023 12:17:53 UTC (180 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators