Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks

Wang, Chunzhuo; Xue, Zhewen; Kumar, T. Sunil; Camps, Guido; Hallez, Hans; Vanrumste, Bart

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.10635 (cs)

[Submitted on 14 Apr 2025]

Title:Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks

Authors:Chunzhuo Wang, Zhewen Xue, T. Sunil Kumar, Guido Camps, Hans Hallez, Bart Vanrumste

View PDF HTML (experimental)

Abstract:Overweight and obesity have emerged as widespread societal challenges, frequently linked to unhealthy eating patterns. A promising approach to enhance dietary monitoring in everyday life involves automated detection of food intake gestures. This study introduces a skeleton based approach using a model that combines a dilated spatial-temporal graph convolutional network (ST-GCN) with a bidirectional long-short-term memory (BiLSTM) framework, as called ST-GCN-BiLSTM, to detect intake gestures. The skeleton-based method provides key benefits, including environmental robustness, reduced data dependency, and enhanced privacy preservation. Two datasets were employed for model validation. The OREBA dataset, which consists of laboratory-recorded videos, achieved segmental F1-scores of 86.18% and 74.84% for identifying eating and drinking gestures. Additionally, a self-collected dataset using smartphone recordings in more adaptable experimental conditions was evaluated with the model trained on OREBA, yielding F1-scores of 85.40% and 67.80% for detecting eating and drinking gestures. The results not only confirm the feasibility of utilizing skeleton data for intake gesture detection but also highlight the robustness of the proposed approach in cross-dataset validation.

Comments:	The manuscript has been accepted in 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE EMBC 2025)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.10635 [cs.CV]
	(or arXiv:2504.10635v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.10635

Submission history

From: Chunzhuo Wang [view email]
[v1] Mon, 14 Apr 2025 18:35:32 UTC (1,751 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators