Skip to content

Publication

The impact of synthetic data generation on data utility with application to the 1991 UK samples of anonymised records

Abstract

"Synthetic data generation has been proposed as a flexible alternative to more traditional statistical disclosure control (SDC) methods for minimising disclosure risk. However, a barrier to the use of synthetic data is the uncertainty about the reliability and validity of the results that are derived from these data. Surprisingly, there has been a relative dearth of research on how to measure the utility of synthetic data. Utility measures developed to date have been either information theoretic abstractions or somewhat arbitrary collations of statistics, and replication of previously published results has been rare. In this paper, we adopt a methodology previously used by Purdam and Elliot (2007), in which they replicated published analyses using disclosure-controlled versions of the same microdata used in said analyses and then evaluated the impact of disclosure control on the analytic outcomes. We utilise the same studies as Purdam and Elliot, based on the 1991 UK Samples of Anonymised Records, to facilitate comparisons of synthetic data utility between different utility metrics." (Author's abstract, IAB-Doku) ((en))

Cite article

Taub, J., Elliot, M. & Sakshaug, J. (2020): The impact of synthetic data generation on data utility with application to the 1991 UK samples of anonymised records. In: Transactions on Data Privacy, Vol. 13, No. 1, p. 1-23.

Download

Open Access