Stitch Contrast and Segment_Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos

Tian, Haitao; Payeur, Pierre

Abstract:Existing skeleton-based human action classification models rely on well-trimmed action-specific skeleton videos for both training and testing, precluding their scalability to real-world applications where untrimmed videos exhibiting concatenated actions are predominant. To overcome this limitation, recently introduced skeleton action segmentation models involve un-trimmed skeleton videos into end-to-end training. The model is optimized to provide frame-wise predictions for any length of testing videos, simultaneously realizing action localization and classification. Yet, achieving such an improvement im-poses frame-wise annotated skeleton videos, which remains time-consuming in practice. This paper features a novel framework for skeleton-based action segmentation trained on short trimmed skeleton videos, but that can run on longer un-trimmed videos. The approach is implemented in three steps: Stitch, Contrast, and Segment. First, Stitch proposes a tem-poral skeleton stitching scheme that treats trimmed skeleton videos as elementary human motions that compose a semantic space and can be sampled to generate multi-action stitched se-quences. Contrast learns contrastive representations from stitched sequences with a novel discrimination pretext task that enables a skeleton encoder to learn meaningful action-temporal contexts to improve action segmentation. Finally, Segment relates the proposed method to action segmentation by learning a segmentation layer while handling particular da-ta availability. Experiments involve a trimmed source dataset and an untrimmed target dataset in an adaptation formulation for real-world skeleton-based human action segmentation to evaluate the effectiveness of the proposed method.

Comments:	Accepted as AAAI 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2412.14988 [cs.CV]
	(or arXiv:2412.14988v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.14988

Computer Science > Computer Vision and Pattern Recognition

Title:Stitch Contrast and Segment_Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators