Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

Zbeeb, Mohammad; Ghorayeb, Mohammad; Salman, Mariam

doi:10.33140/JDAEDM

Computer Science > Machine Learning

arXiv:2411.01929 (cs)

[Submitted on 4 Nov 2024 (v1), last revised 6 Nov 2024 (this version, v2)]

Title:Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

Authors:Mohammad Zbeeb, Mohammad Ghorayeb, Mariam Salman

View PDF HTML (experimental)

Abstract:Artificial Intelligence (AI) research often aims to develop models that can generalize reliably across complex datasets, yet this remains challenging in fields where data is scarce, intricate, or inaccessible. This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize one of the most demanding structured datasets: Malicious Network Traffic. Our approach uniquely transforms numerical data into text, re-framing data generation as a language modeling task, which not only enhances data regularization but also significantly improves generalization and the quality of the synthetic data. Extensive statistical analyses demonstrate that our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data. Additionally, we conduct a comprehensive study on synthetic data applications, effectiveness, and evaluation strategies, offering valuable insights into its role across various domains. Our code and pre-trained models are openly accessible at Github, enabling further exploration and application of our methodology. Index Terms: Data synthesis, machine learning, traffic generation, privacy preserving data, generative models.

Comments:	25 pages, 7 figures, 3 tables, 1 algorithm. code @ this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Report number:	2998-8713
Cite as:	arXiv:2411.01929 [cs.LG]
	(or arXiv:2411.01929v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.01929
Journal reference:	J Data Analytic Eng Decision Making, 1(2), 01-14 (2024)
Related DOI:	https://doi.org/10.33140/JDAEDM

Submission history

From: Mohammad Zbeeb [view email]
[v1] Mon, 4 Nov 2024 09:51:10 UTC (812 KB)
[v2] Wed, 6 Nov 2024 16:50:44 UTC (812 KB)

Computer Science > Machine Learning

Title:Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators