Measuring Pre-training Data Quality without Labels for Time Series Foundation Models

Wen, Songkang; Feofanov, Vasilii; Zhang, Jianfeng

Computer Science > Machine Learning

arXiv:2412.06368 (cs)

[Submitted on 9 Dec 2024]

Title:Measuring Pre-training Data Quality without Labels for Time Series Foundation Models

Authors:Songkang Wen, Vasilii Feofanov, Jianfeng Zhang

View PDF HTML (experimental)

Abstract:Recently, there has been a growing interest in time series foundation models that generalize across different downstream tasks. A key to strong foundation models is a diverse pre-training dataset, which is particularly challenging to collect for time series classification. In this work, we explore the performance of a contrastive-learning-based foundation model as a function of the data used for pre-training. We introduce contrastive accuracy, a new measure to evaluate the quality of the representation space learned by the foundation model. Our experiments reveal the positive correlation between the proposed measure and the accuracy of the model on a collection of downstream tasks. This suggests that the contrastive accuracy can serve as a criterion to search for time series datasets that can enhance the pre-training and improve thereby the foundation model's generalization.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2412.06368 [cs.LG]
	(or arXiv:2412.06368v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.06368
Journal reference:	NeurIPS 2024 Workshop on Time Series in the Age of Large Models

Submission history

From: Vasilii Feofanov [view email]
[v1] Mon, 9 Dec 2024 10:38:30 UTC (377 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-12

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Measuring Pre-training Data Quality without Labels for Time Series Foundation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Measuring Pre-training Data Quality without Labels for Time Series Foundation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators