A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Ying, Wangyang; Wei, Cong; Gong, Nanxu; Wang, Xinyuan; Bai, Haoyue; Malarkkan, Arun Vignesh; Dong, Sixun; Wang, Dongjie; Zhang, Denghui; Fu, Yanjie

Computer Science > Machine Learning

arXiv:2502.08828 (cs)

[Submitted on 12 Feb 2025 (v1), last revised 16 Feb 2025 (this version, v2)]

Title:A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Authors:Wangyang Ying, Cong Wei, Nanxu Gong, Xinyuan Wang, Haoyue Bai, Arun Vignesh Malarkkan, Sixun Dong, Dongjie Wang, Denghui Zhang, Yanjie Fu

View PDF HTML (experimental)

Abstract:Tabular data is one of the most widely used data formats across various domains such as bioinformatics, healthcare, and marketing. As artificial intelligence moves towards a data-centric perspective, improving data quality is essential for enhancing model performance in tabular data-driven applications. This survey focuses on data-driven tabular data optimization, specifically exploring reinforcement learning (RL) and generative approaches for feature selection and feature generation as fundamental techniques for refining data spaces. Feature selection aims to identify and retain the most informative attributes, while feature generation constructs new features to better capture complex data patterns. We systematically review existing generative methods for tabular data engineering, analyzing their latest advancements, real-world applications, and respective strengths and limitations. This survey emphasizes how RL-based and generative techniques contribute to the automation and intelligence of feature engineering. Finally, we summarize the existing challenges and discuss future research directions, aiming to provide insights that drive continued innovation in this field.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.08828 [cs.LG]
	(or arXiv:2502.08828v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.08828

Submission history

From: Wangyang Ying [view email]
[v1] Wed, 12 Feb 2025 22:34:50 UTC (3,292 KB)
[v2] Sun, 16 Feb 2025 16:41:47 UTC (3,292 KB)

Computer Science > Machine Learning

Title:A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators