Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Xie, Laixin; Ouyang, Yang; Chen, Longfei; Wu, Ziming; Li, Quan

doi:10.1109/TVCG.2023.3285210

Computer Science > Machine Learning

arXiv:2309.09744 (cs)

[Submitted on 18 Sep 2023]

Title:Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Authors:Laixin Xie, Yang Ouyang, Longfei Chen, Ziming Wu, Quan Li

View PDF

Abstract:Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.

Comments:	18 pages, 11 figures. This paper is accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)
Subjects:	Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
ACM classes:	I.1.2; H.1.2; H.4.2
Cite as:	arXiv:2309.09744 [cs.LG]
	(or arXiv:2309.09744v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.09744
Related DOI:	https://doi.org/10.1109/TVCG.2023.3285210

Submission history

From: Laixin Xie [view email]
[v1] Mon, 18 Sep 2023 13:16:24 UTC (24,042 KB)

Computer Science > Machine Learning

Title:Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators