Dictionary-Learning-Based Data Pruning for System Identification

Wang, Tingna; Zhang, Sikai; Sun, Limin

Computer Science > Machine Learning

arXiv:2502.11484 (cs)

[Submitted on 17 Feb 2025]

Title:Dictionary-Learning-Based Data Pruning for System Identification

Authors:Tingna Wang (1 and 3), Sikai Zhang (2), Limin Sun (1, 3 and 4) ((1) Department of Bridge Engineering, Tongji University, Shanghai, China, (2) Baosight Software (3) Shanghai Qi Zhi Institute, Shanghai, China, (4) State Key Laboratory of Disaster Reduction in Civil Engineering, Tongji University, Shanghai, China)

View PDF HTML (experimental)

Abstract:System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (via polynomial basis), which introduce redundancy both feature-wise and sample-wise. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called (mini-batch) FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.

Subjects:	Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2502.11484 [cs.LG]
	(or arXiv:2502.11484v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.11484

Submission history

From: Tingna Wang [view email]
[v1] Mon, 17 Feb 2025 06:38:43 UTC (860 KB)

Computer Science > Machine Learning

Title:Dictionary-Learning-Based Data Pruning for System Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Dictionary-Learning-Based Data Pruning for System Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators