TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models

Si, Jacob; Ou, Zijing; Qu, Mike; Xiang, Zhengrui; Li, Yingzhen

Computer Science > Machine Learning

arXiv:2504.04798 (cs)

[Submitted on 7 Apr 2025 (v1), last revised 9 Apr 2025 (this version, v3)]

Title:TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models

Authors:Jacob Si, Zijing Ou, Mike Qu, Zhengrui Xiang, Yingzhen Li

View PDF HTML (experimental)

Abstract:Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly modeling all multi-modal distributions of tabular data in one model. While the latter alleviates this by learning a single representation for all features, it currently leverages sparse suboptimal encoding heuristics and necessitates additional computation costs. In this work, we address the latter by presenting TabRep, a tabular diffusion architecture trained with a unified continuous representation. To motivate the design of our representation, we provide geometric insights into how the data manifold affects diffusion models. The key attributes of our representation are composed of its density, flexibility to provide ample separability for nominal features, and ability to preserve intrinsic relationships. Ultimately, TabRep provides a simple yet effective approach for training tabular diffusion models under a continuous data manifold. Our results showcase that TabRep achieves superior performance across a broad suite of evaluations. It is the first to synthesize tabular data that exceeds the downstream quality of the original datasets while preserving privacy and remaining computationally efficient.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2504.04798 [cs.LG]
	(or arXiv:2504.04798v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.04798

Submission history

From: Jacob Si [view email]
[v1] Mon, 7 Apr 2025 07:44:27 UTC (2,304 KB)
[v2] Tue, 8 Apr 2025 15:10:24 UTC (2,305 KB)
[v3] Wed, 9 Apr 2025 15:38:00 UTC (2,312 KB)

Computer Science > Machine Learning

Title:TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators