Beyond proximity: Clustering of Organic Neighborhoods Using a Two-Staged Unsupervised Learning Approach
Projektlaufzeit: 01.07.2020 bis 30.09.2024
Kurzbeschreibung
Living among people of high socioeconomic status (SES) positively affects individuals’ labor market outcomes, health, and even intergenerational mobility. However, estimates of such neighborhood effects are often small and likely confounded by measurement error even when studies apply causal methods. The main sources of measure- ment error are inflexible and inappropriate neighborhood estimation approaches. In this paper, I present a flexible and data-driven approach for estimating overlapping and arbitrarily shaped neighborhoods. Constructed in a two-stage clustering design, the first stage identifies homogeneous groups within a city (using an auto- mated KMeans algorithm), while the second stage clusters homogeneous groups by proximity (using the HDBSCAN algorithm). Unlike previous neighborhood approaches, the proposed two-stage approach produces overlapping neighborhoods of different sizes, shapes, and den- sities and allows change over time. Preliminary results for all German capital cities show that the size and shape of neighborhoods vary considerably across cities, highlighting the impor- tance of flexible neighborhood estimation techniques. Overall, low SES neighborhoods show a higher segregation than higher SES neighborhoods. Future analyses will provide micro-level evidence for neighborhood effects on the labor market outcomes of adolescents.
Ziel
Entwicklung einer Definition von Nachbarschaften, die nicht allein auf geografischer Nähe beruht, Ermittlung von Nachbarschaftseffekten auf individuelle Arbeitsmarkterfolge