Disclosure control in business data
Abstract
"Generating synthetic datasets based on the ideas of multiple imputation is an innovative method for statistical disclosure control. The basic idea is to replace the values for some confidential variables X with several draws from the posterior predictive distribution of X given some non confidential variables Y. Since the synthetic values are based on models for the joint distribution of the data, many dependencies between the variables are preserved in the released data. Furthermore, the method can be applied to discrete and continuous variables and constraints like non negativity can be incorporated directly at the modelling stage. Especially for business surveys, where usual disclosure control methods like swapping or micro-aggregation would have to be applied on a very high level because of the skewness of the data, the approach yields very promising results. The German Institute for Employment Research (IAB) is developing synthetic datasets for one of its establishment surveys, the IAB Establishment Panel. An actual release of a scientific use file based on synthetic datasets for the last wave of the Panel is planned for 2009. In this paper we discuss the challenges of implementing this approach for a large survey and give preliminary results on the applicability of these ideas for real world datasets." (Author's abstract, IAB-Doku) ((en))
Cite article
Drechsler, J. (2009): Disclosure control in business data. Experiences with multiply imputed synthetic datasets for the German IAB Establishment Survey. In: Europäische Kommission (Hrsg.) (2009): Proceedings of the Eurostat Conference on New Techniques and Technologies for Statistics (NTTS), 2009, Brussels, p. 1-10.