Disclosure control in business data

Beschreibung

"Generating synthetic datasets based on the ideas of multiple imputation is an innovative method for statistical disclosure control. The basic idea is to replace the values for some confidential variables X with several draws from the posterior predictive distribution of X given some non confidential variables Y. Since the synthetic values are based on models for the joint distribution of the data, many dependencies between the variables are preserved in the released data. Furthermore, the method can be applied to discrete and continuous variables and constraints like non negativity can be incorporated directly at the modelling stage. Especially for business surveys, where usual disclosure control methods like swapping or micro-aggregation would have to be applied on a very high level because of the skewness of the data, the approach yields very promising results. The German Institute for Employment Research (IAB) is developing synthetic datasets for one of its establishment surveys, the IAB Establishment Panel. An actual release of a scientific use file based on synthetic datasets for the last wave of the Panel is planned for 2009. In this paper we discuss the challenges of implementing this approach for a large survey and give preliminary results on the applicability of these ideas for real world datasets." (Author's abstract, IAB-Doku) ((en))

Zitationshinweis

Drechsler, Jörg (2009): Disclosure control in business data. Experiences with multiply imputed synthetic datasets for the German IAB Establishment Survey. In: Europäische Kommission (Hrsg.) (2009): Proceedings of the Eurostat Conference on New Techniques and Technologies for Statistics (NTTS), 2009, Brussels, S. 1-10.

Bezugsmöglichkeiten

kostenfreier Zugang

Weitere Informationen

Hier finden Sie ergänzende Informationen.