Does the Synthesis Model Influence a Subsequent Prediction of the Same Type
Abstract
"Disseminating synthetic data enables easy access to data that retains statistical similarities to the original data if access to sensitive data is restricted. However, the model employed when generating the synthetic data may influence the structure of the data, potentially affecting subsequent predictive analysis. This paper empirically investigates whether the choice of synthesis model impacts the performance of predictive models trained on synthetic data. We use various synthesis models to generate synthetic data and subsequently analyze the generated data using predictive models of the same type. We empirically evaluate, whether the choice of the synthesis model influences the performance of the predictive models. For example, CART prediction models might perform systematically better on synthetic data generated using CART models than they perform on the original data. We evaluate this hypothesis based on extensive simulations." (Author's abstract, IAB-Doku) ((en))
Cite article
Fössing, E. & Drechsler, J. (2025): Does the Synthesis Model Influence a Subsequent Prediction of the Same Type. In: UNECE (Hrsg.) (2025): Expert Meeting on Statistical Data Confidentiality. 15-17 October 2025, p. 1-10.
