Friday, March 6, 2026
No menu items!
HomeNatureAdvancing operational global aerosol forecasting with machine learning

Advancing operational global aerosol forecasting with machine learning

Datasets details

MERRA-2 reanalysis

MERRA-2, developed by NASA’s Global Modeling and Assimilation Office (GMAO), is a comprehensive atmospheric reanalysis dataset that spans global atmospheric and climate conditions from 1980 to the present30,35. By assimilating satellite, ground-based and additional observational data into the Goddard Earth Observing System, version 5 (GEOS-5) Earth system model, MERRA-2 provides high-precision meteorological parameters and multilayer atmospheric profiles. With its extensive temporal coverage, high spatial resolution (approximately 50 km) and robust consistency, MERRA-2 has become an indispensable tool for climate change research, air-quality monitoring and environmental policymaking. A defining innovation of MERRA-2 is its aerosol dataset, which integrates joint meteorological and aerosol data assimilation. To our knowledge, this marks the first time aerosol radiative effects have been incorporated directly into the atmospheric model30,35, enhancing the fidelity of aerosol–meteorology interactions. MERRA-2 provides high-resolution data across multiple aerosol components—dust, sulfate, BC, OC and SS—with precise parameters such as spatial distribution, optical depth, concentration and radiative properties.

We used three subsets of the MERRA-2 time-averaged products—aerosol variables (tavg1_2d_aer_Nx), surface atmospheric variables (tavg1_2d_flx_Nx) and upper-air atmospheric variables (tavg3_3d_asm_Nv)—to train, test and evaluate AI-GAMFS, covering 44 years of data from 1980 to 2023. The dataset has spatial resolution of 0.5° × 0.625° (361 × 576 latitude–longitude grid points). For each subset, only data from timestamps corresponding to the 3-hourly overlapping periods (01:30, 04:30, 07:30, …, 22:30 UTC) were used. We focused on forecasting 12 aerosol variables: AOD, TSAOD, SUAOD, DUAOD, BCAOD, OCAOD, SSAOD, SUSMC, DUSMC, BCSMC, OCSMC and SSSMC. All aerosol optical variables are available at the wavelength of 550 nm. In addition, we forecasted 6 surface atmospheric variables and 4 upper-air atmospheric variables at 9 model levels (72, 68, 63, 60, 56, 53, 51, 48 and 45, corresponding to pressure levels of 985 hPa, 925 hPa, 850 hPa, 800 hPa, 700 hPa, 600 hPa, 525 hPa, 413 hPa and 288 hPa, respectively). Specifically, the six surface atmospheric variables are: surface specific humidity (QLML), surface air temperature (TLML), surface eastwards wind (ULML), surface northwards wind (VLML), sea-level pressure (SLP) and total precipitation (PRECTOT). The four upper-air atmospheric variables are: specific humidity (QV), air temperature (T), eastwards wind (U) and northwards wind (V). In total, we forecast and evaluated 54 variables. Detailed information on the MERRA-2 variables used in this study is provided in Supplementary Table 1.

GEOS-FP analyses and forecasts

GEOS-FP is a near-real-time analysis and forecasting system developed by the GMAO10. This system provides global meteorological and aerosol analyses (that is, assimilation fields) and generates 5-day (or 10-day) global forecasts, initialized daily at 00:00 UTC (12:00 UTC). It has grid resolution of approximately 25 km (latitude 0.25°, longitude 0.3125°). GEOS-FP uses the same model configuration as MERRA-230, including the simulation of dust, sulfate, BC, OC and SS via the Goddard Chemistry Aerosol Radiation and Transport model36,37. In addition, GEOS-FP incorporates the assimilation of satellite-based bias-corrected AOD data38.

We used three subsets from the GEOS-FP time-averaged analysis and forecast products (see also Supplementary Table 1), which are consistent with the nomenclature of the MERRA-2 data, to conduct a 5-day comparison experiment between AI-GAMFS historical deterministic and operational forecasts. These subsets contain 54 target variables that fully align with the inputs and outputs of the AI-GAMFS forecasts. To evaluate the forecast performance of AI-GAMFS relative to other global and regional aerosol forecasting models, we used the historical GEOS-FP analyses and MERRA-2 reanalysis data from 22:30 UTC each day in 2023 to drive AI-GAMFS and generate daily 5-day forecasts for the entire year of 2023. In contrast, collecting historical GEOS-FP forecast data is more challenging because the GMAO archives only the most recent 2 weeks of forecast data. Consequently, we collected only the GEOS-FP analyses at 00:00 UTC and 5-day forecast data (initialized daily at 00:00 UTC) from July to August 2024 for the near-real-time operational comparison between AI-GAMFS and GEOS-FP. To drive AI-GAMFS and conduct the comparative analysis, we used bilinear interpolation to resample the GEOS-FP analysis and forecast data to match the spatial resolution of 0.5° × 0.625°.

CAMS aerosol forecasts

CAMS, developed by the European Centre for Medium-Range Weather Forecasts, is one of the most advanced global aerosol forecasting systems5. It provides twice-daily forecasts of global atmospheric composition, including 5-day forecasts of AOD and DUAOD. Using data assimilation techniques, CAMS integrates prior forecasts with current satellite observations to derive optimal initial conditions. It then applies a numerical atmospheric model based on physical and chemical principles to forecast the evolution of aerosols and other atmospheric compositions over the next 5 days5,39. The spatial resolution of the CAMS aerosol forecast product at a single level is 0.4° × 0.4°, with 1-h temporal resolution.

In this study, CAMS served as the baseline for global AOD and DUAOD forecasts based on a physical model, facilitating comprehensive comparison with AI-GAMFS. We used the 5-day global AOD and DUAOD forecasts for the entire year of 2023, initialized daily at 00:00 UTC. To align with AI-GAMFS for comparison or analysis, we resampled the CAMS forecast data to match the spatial resolution of 0.5° × 0.625° and the temporal resolution of 3 h, using time interpolation and bilinear interpolation.

Physical-based dust forecasts

In this study, we used the 2023 dust forecast products from five physical-based dust forecasting models developed by various institutions and deployed in the SDS-WAS Asian Regional Centre. These products include two global models: CAMS and FMI-SILAM6, with FMI-SILAM having 1-h temporal resolution and spatial resolution of 0.2° × 0.2°. In addition, we analysed 3 regional models: CMA-CUACE/Dust7, with 3-h temporal resolution and a spatial resolution of 0.5° × 0.5°; JMA-MASINGAR8, with 1-h temporal resolution and a spatial resolution of 0.5° × 0.5°; and KMA-ADAM39, with 3-h temporal resolution and a spatial resolution of 0.5° × 0.5° (Supplementary Table 1). The JMA-MASINGAR model provides forecasts with a 3-day lead time, and all other models provide forecasts with a 5-day lead time. Detailed descriptions of these models can be found in their respective technical documentation5,6,7,8,9.

Owing to differences in initialization times and dust output variables across the models, we used DUAOD forecast outputs from CAMS, FMI-SILAM, CMA-CUACE/Dust, JMA-MASINGAR and KMA-ADAM3, and for DUSMC, we utilized outputs from all models except CAMS. Notably, except for JMA-MASINGAR, which initializes at 12:00 UTC, all other forecast products begin at 00:00 UTC. To facilitate comparison, we unified the spatial and temporal resolutions of all model outputs to match the AI-GAMFS spatial resolution of 0.5° × 0.625° and 3-h temporal resolution.

AERONET and CARSNET measurements

AERONET is a global aerosol observation network that provides high-quality ground-based measurements of aerosol optical properties31. The network consists of numerous automated stations equipped with sun photometers to monitor AOD and other aerosol parameters in real time. AERONET data are widely regarded as the ‘gold standard’ in atmospheric aerosol observations, serving as high-precision references for climate studies, air-quality monitoring and satellite remote-sensing validation. In this study, we used instantaneous AOD and AODc data (Version 3.0, Level 2.0)40 from all available AERONET sites worldwide during 2023 and July–August 2024. Owing to the lack of a direct method to derive DUAOD, we used AODc as a proxy41 to evaluate the DUAOD forecasts. Furthermore, AERONET does not provide AOD or AODc measurements at 550 nm; therefore, we derived AODc at 550 nm from AODc at 500 nm using the ångström exponent. We used the following quadratic polynomial interpolation method42,43 to convert AOD observations at 4 adjacent wavelengths (440 nm, 500 nm, 675 nm and 870 nm) into AOD values at 550 nm:

$$\mathrmln(\tau _\lambda )=a_+a_1\mathrmln(\lambda )+a_2[\mathrmln(\lambda )]^2$$

(1)

where a0, a1 and a2 represent fitting coefficients, and τλ denotes the AOD value at the respective wavelength λ.

To complement the sparse distribution of AERONET sites across China, we incorporated cloud-screened instantaneous AOD observations at 550 nm (Level 2.0) from CARSNET during 2023 and July–August 2024 into the regional evaluation. CARSNET, which is a ground-based aerosol monitoring network established by the China Meteorological Administration in 200232, uses a systematic calibration protocol: field instruments are calibrated against CARSNET reference standards that are themselves regularly calibrated in coordination with the AERONET programme32,44. Consequently, CARSNET provides AOD data with accuracy comparable to that of AERONET, showing an estimated uncertainty of 0.01–0.02 (ref. 32). To ensure the accuracy of the evaluation, we averaged the AERONET or CARSNET instantaneous observations within a half-hour window before and after the forecast lead time, which served as the reference truth.

In situ aerosol component measurements

To evaluate the forecast accuracy of operational AI-GAMFS for surface aerosol components against GEOS-FP, we collected in situ measurements of aerosol chemical components over the USA and China during July–August 2024. For the USA, daily data on BCSMC, OCSMC and SUSMC were obtained from the IMPROVE network33, with additional daily OCSMC and SUSMC data sourced from the EPA-CSN network33. All datasets were screened using available data-quality flags. For China, we used quality-controlled data from the CAWNET network34, including hourly BCSMC and daily OCSMC and SUSMC measurements.

AI-GAMFS details

As illustrated in Fig. 1a, the AI-GAMFS architecture consists of three primary modules: cube embedding, a vision transformer and cube unembedding. The base model of AI-GAMFS is an autoregressive model that uses the spatial feature tensor at the previous time step (Xtn) as input to forecast the spatial feature tensor at the next time step (Xt). Here t − n and t represent the previous and upcoming time steps, respectively. The base model considers time steps of 3 h, 6 h, 9 h and 12 h. Using the output of the base model as input, AI-GAMFS can generate forecasts for different lead times. For detailed descriptions of the modelling process for each module and the sensitivity analysis of key hyperparameters, see Supplementary Note 1 and Supplementary Fig. 15.

Training strategy

We utilized the MERRA-2 reanalysis with 3-h temporal resolution to train the AI-GAMFS model. Data from 1980 to 2021 were used for training, data from 2022 served as the test set and data from 2023 were used for validation. All input variables, except for time features, were standardized before being processed by the embedding layer, and the output from the unembedding layer was unstandardized to generate the final forecasts. The model uses a rolling training approach, where pairs of samples from two consecutive time points (Xtn and Xt) are fed iteratively into the model for training.

For the standardized samples, the mean absolute error was used as the loss function:

$$L_1=\frac1C\times H\times W\mathop\sum \limits_c=1^C\mathop\sum \limits_i=1^H\mathop\sum \limits_j=1^W|\hatX_c,i,j^t-X_c,i,j^t|$$

(2)

where C, H and W denote the number of variables, the latitudinal grid points and the longitudinal grid points, respectively; c, i and j are the indices for variables, latitude and longitude coordinates, respectively; and \(X_c,i,j^t\) and \(\hatX_c,i,j^t\) represent the ‘ground truth’ (that is, MERRA-2) and the forecasted value at the specified forecasting time, respectively.

The AI-GAMFS framework was implemented on the PyTorch platform. Each model, corresponding to a specified lead time and containing approximately 1.2 billion parameters, was trained on a server equipped with 8 L40 GPUs for 80 epochs (approximately 10 days). We used the Adam optimizer with \(\beta _1\) = 0.9 and \(\beta _2\) = 0.999, and an initial learning rate of 3 × 10−4, which was decayed using a cosine annealing schedule to 0.001 of its initial value. Training was conducted in 32-bit floating-point precision with a dropout rate of 0.15 to mitigate overfitting.

To evaluate the temporal robustness of AI-GAMFS against potential long-term aerosol trends within MERRA-2, we conducted a stratified cross-validation experiment. The training set (1980–2021) was partitioned into 7 contiguous 6-year subsets. The model was iteratively trained on six out of seven subsets and validated on the remaining withheld subset (Supplementary Fig. 13). Results from all seven validations were compared with the model’s performance on the 2022 test set, which was trained on data for the full 1980–2021 period. The overall consistent performance observed across all validations demonstrates that AI-GAMFS captures underlying evolution patterns rather than temporal artefacts, ensuring its reliability for extrapolative forecasting (Supplementary Fig. 14).

Forecasting strategy

Similar to physics-based forecasting models, we observed that forecast errors in deep-learning models accumulate and amplify as the number of rolling iterations increases. Inspired by the temporal aggregation method from Pangu-Weather22, we adopted a relay forecasting strategy that reduces the number of model iterations without compromising the forecast time resolution. Using the same modelling framework and configurations, we trained 4 pretrained AI-GAMFS models with lead times of 3 h, 6 h, 9 h and 12 h, referred to as the 3-h, 6-h, 9-h and 12-h models, respectively. For forecasts with specific lead times, we prioritize the 12-h model and combine it with shorter timescale models in a relay fashion (Fig. 1b and Extended Data Fig. 1). As an example, for a lead time of 54 h, the 12-h forecast model is first invoked 4 times, followed by a single invocation of the 6-h forecast model (Extended Data Fig. 1a). Although this strategy sacrifices some computational efficiency, it takes advantage of the high-speed capabilities of GPUs, enabling the model to produce a 5-day forecast in approximately 39 s on a single L40 GPU.

Evaluation experiment

To rigorously evaluate the forecasting capabilities of AI-GAMFS, we conducted a series of evaluation experiments, using MERRA-2 reanalysis and observational data as reference baselines.

AI-GAMFS relay forecast evaluation

We compared 4 AI-GAMFS model configurations on the 2022 test set, encompassing all 54 aerosol and meteorological variables. These configurations included: a 3-h single model, a 3-h and 6-h relay model, a 3-h, 6-h and 9-h relay model, and a 3-h, 6-h, 9-h and 12-h relay model. This evaluation provides insight into the optimal relay configurations for enhanced predictive performance.

AI-GAMFS versus regional dust forecasting models

We evaluated AI-GAMFS forecasts against five physics-based dust forecasting models across East Asia, using the 2023 validation dataset with MERRA-2 as the baseline. The models included in this comparison—CAMS, CMA-CUACE/Dust, FMI-SILAM, JMA-MASINGAR and KMA-ADAM3—are either specialized dust forecasting models or aerosol models with dust-specific outputs. The evaluation focused on two critical parameters: DUAOD and DUSMC, which allowed us to evaluate AI-GAMFS’s accuracy and reliability in forecasting dust storm events. We also used the full year of AODc observations for 2023 from AERONET at the Beijing-CAMS site for independent evaluation.

AI-GAMFS versus CAMS in global AOD and DUAOD forecasts

We conducted a spatial comparison of AI-GAMFS and CAMS in terms of their 5-day AOD and DUAOD forecasts on a global scale in 2023, using MERRA-2 as the baseline. In addition, the forecasts from AI-GAMFS and CAMS were further evaluated against both global AERONET observations and CARSNET observations from China throughout 2023.

Operational performance of AI-GAMFS versus GEOS-FP

AI-GAMFS is designed for real-time operational forecasting and utilizes GEOS-FP real-time analyses to generate global 5-day aerosol–meteorology forecasts. To evaluate its operational forecasting capabilities for various aerosol components and meteorological variables, we analysed GEOS-FP forecast outputs for July and August 2024. A detailed comparative assessment of AI-GAMFS and GEOS-FP was performed using MERRA-2 as the reference baseline, focusing on all 54 target aerosol and meteorological variables. In addition, the aerosol component forecasts from AI-GAMFS and GEOS-FP were evaluated further against AERONET, CARSNET, IMPROVE, EPA-CSN and CAWNET observations.

Evaluation metrics

For the site-scale evaluation, using independent observations as the reference baseline, we used two metrics: RMSE and Pearson’s R. For the spatial evaluation, using MERRA-2 as the reference baseline, we used two metrics: latitude-weighted RMSE and spatial R, defined as follows:

$$\mathrmRMSE(c,t)=\sqrt{\frac\mathop\sum \limits_i=1^N_\mathrmlat\mathop\sum \limits_j=1^N_\mathrmlonw_i(\hatX_c,i,j^t-X_c,i,j^t)^2N_\mathrmlat\times N_\mathrmlon}$$

(3)

$$R(c,t)=\frac\mathop\sum \limits_i=1^N_\mathrmlat\mathop\sum \limits_j=1^N_\mathrmlon(\hatX_c,i,j^t-\overline\hatX_c,i,j^t)(X_c,i,j^t-\barX_c,i,j^t){\sqrt{\mathop\sum \limits_i=1^N_\mathrmlat\mathop\sum \limits_j=1^N_\mathrmlon(\hatX_c,i,j^t-\overline\hatX_c,i,j^t)^2}\times \sqrt\mathop\sum \limits_i=1^N_\mathrmlat\mathop\sum \limits_j=1^N_\mathrmlon(X_c,i,j^t-\barX_c,i,j^t)^2}$$

(4)

where the latitude weight wi is given by \(w_i=N_\mathrmlat\times \frac\cos \phi _i\sum _i=1^N_\mathrmlat\cos \phi _i\), c represents the specified variable, ϕi refers to the latitudinal value, Nlat is the total number of latitudinal grids, and \(\bar\hatX_c,i,j^t\) and \(\barX_c,i,j^t\) correspond to the spatial averages over all grid points of the forecast field and the ground-truth field, respectively.

RELATED ARTICLES

Most Popular

Recent Comments