RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 388–399 (Mi danma696)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Diffusion models for synthetic tabular data generation

E. D. Telesheva, M. I. Hushchyn

National Research University Higher School of Economics, Moscow

Abstract: The problem of generating high-quality synthetic data is crucial for many data science tasks. A generated dataset can cut the costs on the augmentation of the existing data with additional instances, for example, in physics, or help with its privacy protection, for instance, in banking. However, generating a tabular dataset is challenging, as the data contains both numerical and categorical features. In this paper, we investigate modern approaches for tabular data generation, evaluate several modifications of the state-of-the-art model and whether they affect the quality of synthesized datasets. The modifications include the use of Gaussian diffusion models for both numerical and categorical features and Gaussian noise for the regularization during the training procedure. Comprehensive experiments and estimation of the tabular data generation quality metrics on five publicly available datasets prove that the proposed modified model retains a similar quality of synthesized data compared to the original model while requiring less time to generate synthetic samples.

Keywords: artificial intelligence, generative models, diffusion models, tabular data.

UDC: 004.8

Received: 20.08.2025
Accepted: 22.09.2025

DOI: 10.7868/S2686954325070343



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026