Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning

Mandia, Sandeep; Singh, Kuldeep; Mitharwal, Rajendra; Mushtaq, Faisel; Janu, Dimpal

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.10813 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 15 Feb 2025]

Title:Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning

Authors:Sandeep Mandia, Kuldeep Singh, Rajendra Mitharwal, Faisel Mushtaq, Dimpal Janu

View PDF

Abstract:The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%, and 74.89% on Dataset for Affective States in E-Environments (DAiSEE), Bahcesehir University Multimodal Affective Database-1 (BAUM-1), Yawning Detection Dataset (YawDD), University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), and curated learning-centered affective state dataset respectively. The achieved results on the BAUM-1, DAiSEE, and YawDD datasets demonstrate state-of-the-art performance, indicating the superiority of the proposed model in accurately classifying affective states on these datasets. Additionally, the results obtained on the UTA-RLDD dataset, which involves two-class classification, serve as a baseline for future research. These results provide a foundation for further investigations and serve as a point of reference for future works to compare and improve upon.

Comments:	22 pages, 5 figures, and 6 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.10813 [cs.CV]
	(or arXiv:2502.10813v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.10813

Submission history

From: Sandeep Mandia [view email]
[v1] Sat, 15 Feb 2025 14:37:09 UTC (2,396 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators