Thursday, June 11, 2026
No menu items!
HomeNatureDeep learning four decades of human migration

Deep learning four decades of human migration

Quantifying global migration

Current methods for estimating global migration rely on relatively straightforward techniques compared with the advanced computational approaches adopted in recent years for predicting and explaining human migration and mobility. Estimates of global migrant population stocks, by country of birth and country of residence, are derived from official statistics on foreign-born or foreign populations, with simple interpolation across census years and imputation when data are missing or inconsistent, using regional averages, demographic assumptions or alignment with changes in the population totals22.

The availability of migration flow data is much more limited than that on migrant stocks. Countries that publish migration statistics tend to have well-developed statistical infrastructure for monitoring population movements located in rich, developed settings. The scale of migration flows occurring between developing countries and to and from some of the world’s most populous nations is often unknown. Consequently, to estimate origin-destination migration flows at the global level, indirect methods have been developed based on changes in global migrant stock estimates. These methods were reviewed in a previous work9, alongside a systematic comparison. They identified six methods, grouping them into three classes.

The first class comprises two stock-differencing approaches, which treat changes in bilateral migrant stocks between census rounds as flows. Negative differences are either set to zero or interpreted as return migration. The second class is a migration-rate approach, which derives transition rates directly from a single stock table by dividing each off-diagonal stock count by the global foreign-born population. These rates are then scaled by an approximation of the total number of global flows, calculated as the sum of absolute net migration flows.

The third class includes three demographic accounting methods, which reconcile changes in birth-place-destination stocks, total population, births and deaths with estimated origin-destination flows. In this framework, adjusted stock tables at the beginning and end of the period are used to define outflow and inflow margins. These margins are then arranged into a three-way array of origin, destination and birthplace flows. Missing values in this array are imputed so that the reconstructed flow table matches the stock-based margin totals. To achieve this, an iterative proportional fitting algorithm, adapted from a past work73, is applied to adjust the cell values until the row, column and diagonal constraints are satisfied. Variants of this framework differ in whether inconsistencies in inflow and outflow margins are absorbed into an open demographic system by introducing a residual category12 or resolved in a closed demographic system by scaling adjusted stock tables to enforce consistency13. A further extension combines two imputation strategies within the closed demographic accounting system by weighting alternative treatments of the diagonal cells in the array that represent non-migrants14. The first imputation sets the diagonals to their maximum feasible values, whereas the second applies an independent log-linear fit that relaxes this constraint. The final flow estimates are obtained as a weighted average of the two imputations, with weights calibrated against harmonized European migration flow data. Although each method has trade-offs, the weighted demographic accounting approach produced estimates that were most consistent with reported flow statistics in countries with official data9.

All applications of these indirect methods are constrained by the temporal spacing of the available migrant stock data, typically five-year intervals, and by errors or inconsistencies in the underlying stock statistics. As a result, the estimated flows inherit the limitations of the stock data, including inaccuracies in imputations by UN DESA or other agencies, which can affect both the precision and comparability of global migration flow estimates. Moreover, these methods make use of very limited covariate information—only allowing information via a single variable for the seed values of the iterative proportional fitting procedure, which has minimal impact—further restricting their ability to capture corridor-specific dynamics or explanatory factors.

More recently, direct estimates of global migration flows have been produced using large-scale online data sources21 (discussed above). The estimates represent a substantial advance over indirect methods, as they are based on observed movements, provide higher temporal resolution, and avoid relying solely on changes in migrant stock data. However, the data cover only a limited number of years, omit several important countries, and will not be updated, restricting their long-term utility.

Recurrent neural network approach to quantifying global migration

Demographic account for global migration estimation

UN DESA provides estimates of global migrant stock Sbi(t), that is, the number of people born in country b living in country i at time t (ref. 22). These data are given at five-year intervals from 1990 to 2020, as well as a recent estimate for 2024. The stocks evolve deterministically according to the equation

$$\partial _tS_bi(t)\,=\,\mathop\underbrace\delta _biB_i(t)\limits_\rmbirths\,-\,\mathop\underbrace\gamma _i(t)S_bi(t)\limits_\rmdeaths\,+\,\mathop\underbrace\sum _j(T_bji\,-\,T_bij)(t)\limits_\rmmigration,$$

(2)

where Bi and γi are, respectively, the total number of births and the mortality rate of the country of residence, and δbi is the Kronecker matrix

$$\delta _bi=\left\{\beginarrayll1, & \rmif\,b=i,\\ 0, & \rme\rml\rms\rme.\endarray\right.$$

(3)

The first term in equation (2) simply means that all births in a country increase the native-born stock Sii; the second models population decrease due to deaths; the third models the change in stocks due to migration. Tbij is the flow of migrants born in b from country i to j, and is the quantity we wish to infer. The total flow of people from i to j, regardless of their place of birth, is of course then given by

$$F_ij=\sum _bT_bij,$$

(4)

while a country’s net migration (arrivals minus departures) is

$$\bfM_i=\sum _jF_ji\,-\,F_ij.$$

(5)

Target data

Aside from the stock data, there are numerous datasets of partial observations of the flow F and the net migration \(\bfM\) to which we could constrain our estimate \(\hatT\), although, as mentioned in the introduction, these do not always use consistent definitions of a migration event. The UN DESA World Population Prospects dataset53 and the US Census Bureau International Database70 both provide estimates of annual net migration for all countries from 1950 to 2024. These figures are mainly calculated as the residual between the total change in population and natural growth (births minus deaths), and for most countries, they are not derived from immigration statistics; we thus do not include them as target variables. Instead, we use net annual migration statistics from around 30 countries and territories in Europe, North America, Oceania and East Asia, sourced from national statistical bureaus (Supplementary Information).

Observations of total origin-destination flows F are taken mainly from five sources, which all use a one-year definition of migration flows:

  • Harmonized intra-European flows: the QuantMig database10,11 provides probabilistic estimates of migration flows between 30 countries in Europe from 2009 to 2019, and is based on publicly available Eurostat data. These have been harmonized to use a common definition of migration, and also provide uncertainty estimates, which we use to weight the target data points in the loss function used to train the neural network (see below).

  • National immigration statistics from Sweden, New Zealand and Finland74,75,76: all three countries report total annual in- and out-flows by origin and destination.

  • Facebook data21: estimates of annual bilateral migration flows between 181 countries from 2019 to 2022 from an analysis of online social media data. We only include annual flows of at least 25 people, as noise was added by the authors to prevent data disclosure, which distorts small values.

Target values are prioritized in this order, meaning that if two datasets both contain values for the same origin–destination pair, we use the source furthest up in the list.

Input covariates

Each value Tbij is a flow through a network multi-edge connecting the birth country b, the origin i and destination j. We train a deep neural network to learn a mapping \(\boldsymbol\chi _bij(t)\mapsto \hatT_bij(t)\), where χ is a vector of economic, social and geographic covariates pertaining to the three connected countries (Extended Data Fig. 1a). In the following we give a summary of the covariates used (refer to Extended Data Table 1 for an overview, and the Supplementary Information for further details).

Demographic covariates. For each country b, i, j, we include the total population and life expectancies; for the origin and destination countries, we also include birth and death rates, all taken from the UN WPP dataset53.

Economic covariates. For each country b, i, j, we include annual real GDP per capita (in constant 2015 US dollars) and annual GDP growth rate (in per cent). Data are taken from the World Bank, UN Conference on Trade and Development, the International Monetary Fund (IMF)77,78,79,80, as well as national statistical bureaus. Missing values are calculated by deflating nominal to real GDP using the World Bank deflator; where the deflator is not given, we estimate the deflator from neighbouring or similar nations. Gaps are also filled by calculating the GDP growth rate from the Maddison Project dataset81,82, which provides GDP in constant 2011 purchasing power parity. The growth rates are then used to extrapolate GDP back or forward in time. We also input bilateral trade flows between origin and destination, given in real 2015 US dollars. These are mostly taken from the harmonized BACI dataset83,84, and missing values are again extrapolated using the growth rates from the UN Comtrade and IMF Direction of Trade datasets85.

Geographic, cultural, and political covariates. Religious and linguistic proximity—measuring the extent of overlap in religious affiliation and common spoken languages—are taken from the Correlates of War World Religion Data, CIA World Factbook, and USITC Domestic and International Common Language Database datasets86,87,88. Religious similarity measures the overlap in the number of adherents of major religions; for two countries with religious makeup αi and αj, the similarity score is simply the dot product αiαj. The linguistic similarity score is given by the ‘common spoken language’ index; missing entries are filled with the average of linguistic proximity and common native language88. We also include the population-weighted geodesic distance85, as well as a number of binary covariates: EU membership of the three indexed nations; binary variables colonybj and colonyij, which are 1 if the first indexed country was ever a colony of the second89; as well as the two binary variables δbi and δbj, with δ the Kronecker delta (equation (3)), which indicate whether an individual is a native of the origin or the destination.

Conflict deaths, refugee and migrant stock. To model short-term, disaster-driven migration, we include data on wars and other shocks that trigger large population movements. We include the total number of deaths related to organized violence in the origin and destination countries, given by the Uppsala Conflict Data programme’s Georeferenced Event Dataset90,91. We include the total number of refugees, asylum seekers, and others in need of international protection, as given by the UNHCR’s refugee statistics72, and also include the annual change in the figures. Finally, we also input the total migrant stock both in the origin and the destination, Sbi(t) and Sbj(t), for each year. As the UN only provides these data points at most every five years, intermediate values are taken from the neural network prediction \(\hatS\) itself. Where initial values are missing, we extrapolate the stocks back to 1990 using a weighted average of similar countries; the weights are calculated by considering correlations across time and space.

Training

We apply the training methodology broadly outlined in a past work23. We wish to not only incorporate information on the current state of the world at time t, contained in χbij(t), but also information on the past. To do this, we use a recurrent neural network uθ, which takes as input the covariates (including stocks) as well as a Z-dimensional hidden or latent state zbij(t). This latent variable represents a ‘memory’ of past changes and their effects on the present flow estimate. The neural network outputs a (log-scaled) estimate of the flow Tbij, as well as the updated latent state zbij(t + 1), which is then input to the neural network to predict the next point in time:

$$u_\theta (\boldsymbol\chi _bij,z_bij)=(\log T_bij,z_bij(t+1))\in \mathbbR^1+Z.$$

(6)

Note that the estimates \(\widehatT_bij\) and all their derived quantities will be real-valued, despite integer target data. This gives a recursive training procedure, where each output is fed back into the neural network to inform the next estimate (Extended Data Fig. 1a). The latent state zbij is initialized at zero and can take any value in \(\mathbbR^Z\).

The neural network consists of a set of trainable parameters θ that are optimized using the gradient of the loss function, J, which is designed to ensure that predicted and observed stocks, net migration values and flows Fij agree, and is structurally an L2-loss of all of the different values yk. We make two important modifications to this basic loss function: first, we scale the data to make the errors \(\widehaty-y\) more normal, ensuring the loss function is not dominated by the largest values (this will be addressed below); and second, we weight each term in the loss function by its uncertainty to bias the loss towards values in which we have greater confidence (Extended Data Fig. 1c):

$$J_\theta \approx \sum _kw_k(\widehaty_k-y_k)^2,$$

(7)

with the index k ranging over all of the target values in a single batch. The weights wk are constructed from the relative uncertainty on each point, clamped to the interval [0.5, 2], and normalized such that the mean weight is 1. The QuantMig dataset provides standard errors on the estimates which we use to populate the weights for the flow targets; for all other flow targets, we set the weight to the average weight of the QuantMig weights or 1. For the stocks, we apply the demographic accounting scheme presented in past works9,12: given the stock tables for two successive years S(t1), S(t2), we add births and deaths, and constrain the resulting tables to match their midpoint marginals using iterative proportional fitting. Subsequently subtracting births and deaths again gives two demographically balanced stock estimates for each year, from which we can estimate the uncertainty on each value Sbi. For the net migration targets, we set the weights to 1 (refer to the Supplementary Information for details).

Scaling the input and target data

Much of the input and target data are heavily skewed Poisson distributions, with long tails caused by a small number of strong outliers; to improve learning, it is common practice to transform data to make it more normal. To this end we use a symmetrized Yeo–Johnson transform:

$$\psi _\lambda (x)=\rms\rmg\rmn(x)\times \left\{\beginarrayl\frac(^\lambda \,-\,1\lambda \,\rmi\rmf\,\lambda \ne 0,\\ \log (| x| +1)\,\rme\rml\rms\rme\endarray\right.$$

(8)

with \(\rms\rmg\rmn(x)\) the sign function. Compared with the standard transformation92, the symmetrization ensures that negative values are transformed more evenly. The parameter λ can be chosen to move the distribution of ψλ(x) closer towards a normal distribution (Extended Data Fig. 1b); for λ = 1, ψ is simply the identity. The transformed input data are further normalized to have zero mean and unit variance. Note that the transformation equation (8) is invertible, with inverse \(\psi _\lambda ^-1\); we can thus always reverse any transformation to obtain the original data. We rescale all non-binary covariates except the religious and linguistic similarity indices to be approximately normal (refer to the Supplementary Information for the λ values used for each).

To improve prediction accuracy on edges with smaller flows, we also transform the target data using the above function ψ; the loss function then reads

$$\beginarraycJ_\theta \,=\,\langle w_bi^s(\psi _\lambda _1(\Delta \hatS_bi)-\psi _\lambda _1(\Delta S_bi))^2\rangle \\ \,+\,\langle w_i^m(\psi _\lambda _2(\hat\bfM_i)-\psi _\lambda _2(\bfM_i))^2\rangle \\ \,+\,\langle w_ij^f(\psi _\lambda _3(\hatF_ij)-\psi _\lambda _3(F_ij))^2\rangle .\endarray$$

(9)

Here, denotes the average over all target values. Observe that we are not matching stock values directly, but rather the difference in stocks over five-year intervals. This is to avoid conditioning the stock value on (possibly erroneous) initial values, and ensure independence of the stock estimates. An optimal initial stock value can be estimated after training using a least squares approach, to fit the time series \(\widehatS_bi(t)\) to the data (see below).

Model selection and validation

To select the architecture of the neural network, that is, the number of neurons and layers, the activation functions, and the latent dimension Z, we use hyperparameter tuning on synthetic data (Supplementary Information). We use a deep network with 7 layers, 60 neurons per layer, and the hyperbolic tangent as the activation function on each layer except the last, where we use the CeLU function93

$$\sigma (x)=\max (0,x)+\min (0,\alpha (\exp (x/\alpha )\,-\,1))$$

with α = −12. The latent dimension Z was set to 100. Also by using a hyperparameter sweep on synthetic data, the scaling parameters for the target data λi were all set to 0.7.

Uncertainty quantification

Uncertainty on the estimates stems from two sources: first, the degree to which the inference problem is ill-posed, meaning the number of possible solutions that fit the data; and second, the uncertainty on the input covariates themselves23. The uncertainty arising from the (potential) ill-posedness of the problem can be estimated by training an ensemble of neural networks, thereby generating a distribution on the parameters θ (ref. 24). This is computationally costly, as a family of neural networks need to be trained in parallel. Meanwhile, in theory the uncertainty on the input data can be ‘pushed through’ the trained neural network, as in a previous work94. Given a prior distribution π0 on the input and neglecting the uncertainty on θ, the posterior is simply

$$p(T)=(u_\theta _\rm\#\pi _)(\boldsymbol\chi ),$$

(10)

where # indicates a pushforward. For our estimates, we do not know the uncertainty on the inputs except for the initial stock estimate. We train an ensemble of 15 neural networks in parallel to solve the inference problem, and for each draw 100 samples of the initial stocks to estimate overall uncertainty. This gives n = 1,500 samples of the flow table \(\hatT\). In this Article and the accompanying datasets95, we provide the average value \(\langle \hatT\rangle \) and one s.d.

Our uncertainty quantification is designed for the global setting, where estimates cover migrant flows and stocks simultaneously across all countries, often with limited metadata. Past uncertainty quantification work in migration estimation has focused primarily on flows, most notably the Integrated Modelling of European Migration (IMEM) model96, which implemented a measurement model with a Bayesian hierarchical framework to explicitly reconcile definitional mismatches, timing criteria, and measurement error for different European countries’ flow data, drawing on expert opinions97. IMEM was a direct predecessor to the QuantMig project11 and has since been extended to incorporate further data sources98,99,100. Further adaptations have also been made to the IMEM model to estimate European bilateral stocks with uncertainty, independent of flow data101, again relying on richer metadata on input migration measures than are available globally. By contrast, our approach explicitly links changes in migrant stock data to their estimated flows. More direct measures of measurement uncertainty for the flow data are not explicitly included as we include only relatively high-quality flow measures with a similar definition, unlike previous models of European migration flows, where data quality varied considerably between countries. Note that, as we use the QuantMig estimates and their uncertainties as target data, these explicit measures are fed into our model.

Calculating the elasticity

The elasticity equation (1) is further given by

$$\nu _k=\left|\frac\partial u_\theta \partial x_k\right||x_k|,$$

with xk the untransformed kth covariate. As the neural network takes ψλ-transformed covariates as input, we can apply the chain rule to χk = ψλ(xk) to obtain

$$\nu _k=|\frac\partial u_\theta \partial \chi _k\frac\partial \psi _\lambda \partial x_k||x_k|.$$

Calibrating the initial stock value

To generate a time series of migrant stocks \(\widehatS_bi(t)\), an initial condition Sbi(t0 = 1990) is required to solve the stock evolution equation (2) forward in time. This initial value may be taken from existing sources such as UN DESA where available; however, rather than conditioning on a potentially noisy or inconsistent baseline, we instead calculate a constant offset that best aligns modelled and observed values while accounting for demographic dynamics. Let Sbi(t) be the observed migrant stock, \(\widehatS_bi(t)\) the corresponding model prediction generated from an arbitrary initial value Sbi(t0), \(w_bi^s(t)\) the weight on each observation, and γi(t) the mortality rate of country i. Define the survival fraction \(\widetilde\gamma _i(t)\) as the proportion of individuals alive in 1990 who are still alive in year t:

$$\widetilde\gamma _i(t)=\prod _t_ < \tau \le t(1\,-\,\gamma _i(\tau ))\ge 0$$

and \(\widetilde\gamma (t_)=1\). The optimal offset \(\rho _bi\in \mathbbR\) of the initial stock value \(\widehatS_bi(t_)\) is then computed by minimizing the weighted squared error between observed and predicted stocks:

$$\rho _bi=\frac\sum _t\widetilde\gamma _iw_bi^s(S_bi(t)\,-\,\widehatS_bi(t))\sum _tw_bi\widetilde\gamma _i^2\in \mathbbR,$$

(11)

and the baseline-shifted stock then given by \(\widehatS_bi(t)+\widetilde\gamma _i(t)\rho _bi\). ρbi is further constrained to ensure that all resulting stocks remain non-negative.

Comparison and validation

We further validate our approach on a dataset of unseen bilateral origin- and birth-destination flows. As the datasets do not use a temporally consistent definition of migration (or include many different measures of migration), we do not calculate prediction errors directly, but rather use a series of correlation metrics similar to those presented in a previous work9. Unlike in Fig. 6a, we calculate correlations between the entire dataset Y:

  • Count: Pearson correlation coefficient on Y.

  • Log count: Pearson correlation coefficient on \(\log (Y+1)\).

  • Proportion: Pearson correlation on yij/∑iyij if the observation yij was reported by the destination country, else yij/∑jyij.

  • Migration rate: Pearson correlation on yij/Pi, with Pi the total population of the origin.

  • Emigration rate: Pearson correlation on ∑jyij/Pi.

  • Immigration rate: Pearson correlation on ∑iyij/Pj.

  • Net count: Pearson correlation on the net count ∑i(yijyji).

Note that in Extended Data Fig. 4 we show correlations on both the total origin-destination flow (∑bTbij) as well as the total birth-destination flow (∑iTbij). We use the following validation datasets:

  • DEMIG C2C7: bilateral flow data for 34 reporting (mostly European) countries from 1990–2011; this dataset contains both origin- and birth-destination flows.

  • DEMIG TOTAL102: total immigration and emigration flows, as well as net counts, from 1990–2011.

  • Eurostat67: bilateral origin-destination flows, mostly within Europe, from 1998–2019.

  • IPUMS International103: immigration totals from census data covering the period 1990–2016.

  • UN DESA IMFSC8: bilateral origin-destination and birth-destination flows, reported by 45 (mainly European) countries from 1990–2013.

  • UN CEPAL IMILA69: bilateral birth-destination flow data to and from Latin American countries from 1990–2013. Excludes return migration of native-born emigrants.

  • OECD68: birth-destination flows for OECD countries, 1995–2013.

  • WPP53: UN WPP net migration estimates for all countries, 2024 revision, 1990–2020.

Extended Data Fig. 4 shows the metrics for the various stock differencing methods outlined in the introduction, as well as our neural estimates. As all methods except our own produce five-year flows, we aggregate our results up to the five-year level. The closed demographic accounting methods have been adjusted to match the same demographic residuals used to calculate the UN WPP net migration figures, hence their correlation with that dataset is 1. Extended Data Figs. 5 and 6 show the statistical significance of the correlation score differences between our method and each of the ones described above.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

RELATED ARTICLES

Most Popular

Recent Comments