Skip to content

Learning About Internal Migration from Half a Billion Records – Applying Localised Classification Trees to Large-scale Census Data

Author: Stephany, F., Abel, G, J., & Muttarak, R.
Published in: SocArXiv
Year: 2019
Type: Academic articles
DOI: 10.31235/

Understanding who migrates is crucial in explaining societal changes and forecasting future population composition and size. However, there is no empirical consensus on demographic and socioeconomic factors driving migration decision. Exploiting micro census data from the Integrated Public Use Microdata Series International (IPUMSI) database across 65 countries over the period 1960 to 2012 covering 477,296,432 individual records, this study aims to establish common demographic drivers of migration. Given an exceptionally large number of observations, a parametric approach would simply yield bias estimates of standard errors of the variables of interest. We apply a machine learning technique using decision tree models to establish common demographic patterns driving migration in our data. The decision trees are applied to each country year sample individually in order to control for local optima. Resulting feature selections are compared across countries and years. We find that globally, age, education, household size, and urbanisation are important drivers of internal migration. Age and education are particularly important predictors in Europe and Northern America whilst in South and Central America and Africa, urbanisation and household size are more relevant. The applied method of localised decision trees could be a helpful tool for analysing large-scale census data in other social science domains.

Visit publication


Connected HIIG researchers

  • Peer Reviewed