Skip to content

Publication

Missing data in the record linkage of process and survey data : An empirical comparison of selected missing data techniques

Abstract

"To compare different missing data techniques, in this paper I use a survey where participants were among other things asked permission for combining the survey with administrative data (record linkage). For those who refuse their permission I set their survey answers to missing, creating pseudo-missing data due to an empirical relevant but unknown mechanism (compared to the statistical simulation of a missing data process). OLS Regression is performed using Complete Case Analysis (CCA), Multiple Imputation (MI) and two versions of Heckman's Sample Selection Model (SSM) to correct for the pseudo-missing data. Their results are compared to a regression based on the complete data set (Benchmark), that gives us the 'true' regression parameters. Results: All missing data techniques under analysis show only small deviations from the benchmark. If only one independent variable contains missing values, MI performs best. If the dependent variable has missing information, CCA and the Two-Step SSM perform better than MI. If missing data is a problem in many or all independent variables, all techniques except for the Maximum likelihood SSM perform equally well." (Author's abstract, IAB-Doku) ((en))

Cite article

Krug, G. (2009): Fehlende Daten beim Record Linkage von Prozess- und Befragungsdaten. Ein empirischer Vergleich ausgewählter Missing Data Techniken. (IAB-Discussion Paper 07/2009), Nürnberg, 29 p.

Download

Free Access