Challenges in Measuring Utility for Fully Synthetic Data
Abstract
"Evaluating the utility of the generated data is a pivotal step in any synthetic data project. Most projects start by exploring various synthesis approaches trying to identify the most suitable synthesis strategy for the data at hand. Utility evaluations are also always necessary to decide whether the data are of sufficient quality to be released. Various utility measures have been proposed for this purpose in the literature. However, as I will show in this paper, some of these measures can be misleading when considered in isolation while others seem to be inappropriate to assess whether the synthetic data are suitable to be released. This illustrates that a detailed validity assessment looking at various dimensions of utility will always be inevitable to find the optimal synthesis strategy." (Author's abstract, IAB-Doku, © Springer) ((en))
Cite article
Drechsler, J. (2022): Challenges in Measuring Utility for Fully Synthetic Data. In: J. Domingo-Ferrer & M. Laurent (Hrsg.) (2022): Privacy in Statistical Databases 2022, p. 220-233. DOI:10.1007/978-3-031-13945-1_16