Machine Learning: complement or replacement of Numerical Weather Prediction? 

Emanuele Silvio Gentile – e.gentile@pgr.reading.ac.uk

Figure 1 Replica of the first 1643 Torricelli barometer [1]

Humans have tried, for millennia, to predict the weather by finding physical relationships between observed weather events, a notable example being the descent in barometric pressure used as an indicator of an upcoming precipitation event. It should come as no surprise that one of the first weather measuring instrument to be invented was the barometer, by Torricelli (see in Fig. 1 a replica of the first Torricelli barometer), nearly concurrently with a reliable thermometer. Only two hundred years later, the development of the electric telegraph allowed for a nearly instant exchange of weather data, leading to the creation of the first synoptic weather maps in the US, followed by Europe. Synoptic maps allowed amateur and professional meteorologists to look at patterns between weather data in an unprecedented effective way for the time, allowing the American meteorologists Redfield and Epsy to resolve the dispute on which way the air flowed in a hurricane (anticlockwise in the Northern Hemisphere).

Figure 2 High Resolution NWP – model concept [2]

By the beginning of the 20th century many countries around the globe started to exchange data daily (thanks to the recently laid telegraphic cables) leading to the creation of global synoptic maps, with information in the upper atmosphere provided by radiosondes, aeroplanes, and in the 1930s radars. By then, weather forecasters had developed a large set of experimental and statistical rules on how to compute the changes to daily synoptic weather maps looking at patterns between historical sets of synoptic daily weather maps and recorded meteorological events, but often, prediction of events days in advance remained challenging.

In 1954, a powerful tool became available to humans to objectively compute changes on the synoptic map over time: Numerical Weather Prediction models. NWPs solve numerically the primitive equations, a set of nonlinear partial differential equations that approximate the global atmospheric flow, using as initial conditions a snapshot of the state of the atmosphere, termed analysis, provided by a variety of weather observations. The 1960s, marked by the launch of the first satellites, enabled 5-7 days global NWP forecasts to be performed. Thanks to the work of countless scientists over the past 40 years, global NWP models, running at a scale of about 10km, can now simulate skilfully and reliably synoptic-scale and meso-scale weather patterns, such as high-pressure systems and midlatitude cyclones, with up to 10 days of lead time [3].

The relatively recent adoption of limited-area convection-permitting models (Fig. 2) has made possible even the forecast of local details of weather events. For example, convection-permitting forecasts of midlatitude cyclones can accurately predict small-scale multiple slantwise circulations, the 3-D structure of convection lines, and the peak cyclone surface wind speed [4].

However, physical processes below convection permitting resolution, such as wind gusts, that present an environmental risk to lives and livelihoods, cannot be explicitly resolved, but can be derived from the prognostic fields such as wind speed and pressure. Alternative techniques, such as statistical modelling (Malone model), haven’t yet matched (and are nowhere near to) the power of numerical solvers of physical equations in simulating the dynamics of the atmosphere in the spatio-temporal dimension.

Figure 3 Error growth over time [5]

NWPs are not without flaws, as they are affected by numerical drawbacks: errors in the prognostic atmospheric fields build up through time, as shown in Fig. 3, reaching a comparable forecast error to that of a persisted forecast, i.e. at each time step the forecast is constant, and of a climatology-based forecast, i.e. mean based on historical series of observations/model outputs. Errors build up because NWPs iteratively solve the primitive equations approximating the atmospheric flow (either by finite differences or spectral methods). Sources of these errors are: too coarse model resolution (which leads to incorrect representation of topography), long integration time steps, and small-scale/sub-grid processes which are unresolved by the model physics approximations. Errors in parametrisations of small-scale physical processes grow over time, leading to significant deterioration of the forecast quality after 48h. Therefore, high-fidelity parametrisations of unresolved physical processes are critical for an accurate simulation of all types of weather events.

Figure 4 Met Office HPC [6]

Another limitation of NWPs is the difficulty in simulating the chaotic nature of weather, which leads to errors in model initial conditions and model physics approximations that grow exponentially over time. All these limitations, combined with instability of the atmosphere at the lower and upper bound, make the forecast of rapidly developing events such as flash floods particularly challenging to predict. A further weakness of NWP forecasts is that they rely on the use of an expensive High Parallel Computing (HPC) facility (Fig. 4), owned by a handful of industrialised nations, which run coarse scale global models and high-resolution convection-permitting forecasts on domains covering area of corresponding national interest. As a result, a high resolution prediction of weather hazards, and climatological analysis remains off-limits for the vast majority of developing and third-world countries, with detrimental effects not just on first line response to weather hazards, but also for the development of economic activities such agriculture, fishing, and renewable energies in a warming climate. In the last decade, the cloud computing technological revolution led to a tremendous increase in the availability and shareability of weather data sets, which transitioned from local storage and processing to network-based services managed by large cloud computing companies, such as Amazon, Microsoft or Google, through their distributed infrastructure.

Combined with the wide availability of their cloud computing facilities, the access to weather data has become more and more democratic and ubiquitous, and consistently less dependent on HPC facilities owned by National Agencies. This transformation is not without drawbacks in case these tech giants decide to close the taps of the flow of data. During a row with the Australian government, Facebook banned access to Australian news content in Feb 2021. Although by accident, also government-related agencies such as the Bureau of Meteorology were banned, leaving citizens with restricted access to important weather information until the pages were restored. It is hoped that with more companies providing distributed infrastructure, the accessibility to vital data for citizen security will become more resilient.

The exponential accessibility of weather data sets has stimulated the development and the application of novel machine learning algorithms. As a result, weather scientists worldwide can crunch increasingly effectively multi-dimensional weather data, ultimately providing a new powerful paradigm to understand and predict the atmospheric flow based on finding relationships between the available large-scale weather datasets.

Machine learning (ML) finds meaningful representations of the patterns between the data through a series of nonlinear transformations of the input data. ML pattern recognition is distinguished into two types: supervised and unsupervised learning.

Figure5 Feed-forward neural network [6]

Supervised Learning is concerned with predicting an output for a given input. It is based on learning the relationship between inputs and outputs, using training data consisting in example input/output pairs, being divided into regression or classification, depending on the type of the output variable to be predicted (discrete or continuous). Support Vector Machine (SVM) or Regression (SVR), Artificial Neural Network (ANN, with the feed-forward step shown in Fig. 5), and Convolutional Neural Network (CNN) are examples of supervised learning.

Unsupervised learning is the task of finding patterns within data without the presence of any ground truth or labelling of the data, with a common unsupervised learning task being clustering (group of data points that are close to one another, relative to data points outside the cluster). Examples of unsupervised learning are the K-means and K-Nearest Neighbour (KNN) algorithms [7].

So far, ML algorithms have been applied to four key problems in weather prediction:  

  1. Correction of systematic error in NWP outputs, which involves post-processing data to remove biases [8]
  1. Assessment of the predictability of NWP outputs, evaluating the uncertainty and confidence scores of ensemble forecasting [9]
  1. Extreme detection, involving prediction of severe weather such as hail, gust or cyclones [10]
  1. NWP parametrizations, replacing empirical models for radiative transfer or boundary-layer turbulence with ML techniques [11]

The first key problem, which concerns the correction of systematic error in NWPs, is the most popular area of application of ML methods in meteorology. In this field, wind speed and precipitation observational data are often used to perform an ML linear regression on the NWP data with the end goal of enhancing its accuracy and resolving local details of the weather which were unresolved by NWP forecasts. Although attractive for its simplicity and robustness, linear regression presents two problems: (1) least-square methods used to solve linear regression do not scale well with the size of datasets (since matrix inversion required by least square is increasingly expensive for increasing datasets size), (2) Many relationships between variables of interest are nonlinear. Instead, classification tree-based methods have proven very useful to model non-linear weather events, from thunderstorm and turbulence detection to extreme precipitation events, and the representation of the circular nature of the wind. In fact, compared to linear regression, random trees exhibit an easy scalability with large-size datasets which have several input variables. Besides preserving the scalability to large datasets of tree-based method, ML methods such as ANN and SVM/R provide also a more generic and more powerful alternative for nonlinear-processes modelling. These improvements have come at the cost of a difficult interpretation of the underlying physical concepts that the model can identify, which is critical given that scientists need to couple these ML models with physical-equations based NWP for variable interdependence. As a matter of fact, it has proven challenging to interpret the physical meaning of the weights and nonlinear activation functions that describe in the ANN model the data patterns and relationships found by the model [12].

The second key problem, represented by the interpretation of ensemble forecasts, is being addressed by ML unsupervised learning methods such as clustering, which represents the likelihood of a forecast aggregating ensemble members by similarity. Examples include grouping of daily weather phenomena into synoptic types, defining weather regimes from upper air flow patterns, and grouping members of forecast ensembles [13].

The third key problem, which concerns the prediction of weather extremes, corresponding to weather phenomena which are a hazard to lives and economic activities, ML based methods tend to underestimate these events. The problem here lies with imbalanced datasets, since extreme events represent only a very small fraction of the total events observed [14].

The fourth key problem to which ML is currently being applied, is parametrisation. Completely new stochastic ML approaches have been developed, and their effectiveness, along with their simplicity compared to traditional empirical models has highlighted promising future applications in (moist) convection [15]

Further applications of ML methods are currently limited by intrinsic problems affecting the ML methods in relation to the challenges posed by weather data sets. While the reduction of the dimensionality of the data by ML techniques has proven highly beneficial for image pattern recognition in the context of weather data, it leads to a marked simplification of the input weather data, since it constrains the input space to individual grid cells in space or time [16]. The recent expansion of ANN into deep learning has provided new methodologies that can address these issues. This has pushed further the capability of ML models within the weather forecast domain, with CNNs providing a methodology for extracting complex patterns from large, structured datasets have been proposed, an example being the CNN model developed by Yunjie Liu in 2016 [17] to classify atmospheric rivers from climate datasets (atmospheric rivers are an important physical process for prediction of extreme rainfall events).

Figure 7 Sample images of atmospheric rivers correctly classified (true positive) by the deep CNN model in [18]

At the same time, Recursive Neural Networks (RNN), developed for natural language processing, are improving nowcasting techniques exploiting their excellent ability to work with the temporal dimension of data frames. CNN and RNN have now been combined, as illustrated in Fig. 6, providing the first nowcasting method in the context of precipitation, using radar data frames as input [18].

Figure 6 Encoding-forecasting ConvLSTM network for precipitation nowcasting [18]

While these results show a promising application of ML models to a variety of weather prediction tasks which extend beyond the area of competence of traditional NWPs, such as analysis of ensemble clustering, bias correction, analysis of climate data sets and nowcasting, they also show that ML models are not ready to replace NWP to forecast synoptic-scale and mesoscale weather patterns.

As a matter of fact, NWPs have been developed and improved for over 60 years with the very purpose to simulate very accurately and reliably the wind, pressure, temperature and other relevant prognostic fields, so it would be unreasonable to expect ML models to outperform NWPs on such tasks.

It is also true that, as noted earlier, the amount of available data will only grow in the coming decades, so it will be critical as well as strategic to develop ML models capable to extract patterns and interpret the relationships within such data sets, complementing NWP capabilities. But how long before an ML model will be capable to replace an NWP by crunching the entire set of historical observations of the atmosphere, extracting the patterns and the spatial-temporal relationships between the variables, and then performing weather forecasts?

Acknowledgement: I would like to thank my colleagues and friends Brian Lo, James Fallon, and Gabriel M. P. Perez, for reading and providing feedback on this article.

References

  1. https://collection.sciencemuseumgroup.org.uk/objects/co54518/replica-of-torricellis-first-barometer-1643-barometer-replica 
  1. https://www.semanticscholar.org/paper/High-resolution-numerical-weather-prediction-(NWP)-Allan-Bryan/a40e0ebd388b915bdd357f398baa813b55cef727/figure/6 
  1. Buizza, R., Houtekamer, P., Pellerin, G., Toth, Z., Zhu, Y. and Wei, M. (2005) A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon Weather Rev, 133, 1076 – 1097 
  1. Lean, H. and Clark, P. (2003) The effects of changing resolution on mesocale modelling of line convection and slantwise circulations in FASTEX IOP16. Q J R Meteorol Soc, 129, 2255–2278 
  1. http://www.chanthaburi.buu.ac.th/~wirote/met/tropical/textbook_2nd_edition/navmenu.php_tab_10_page_4.3.5.htm 
  1. Bishop, C., and Christopher, M., Pattern Recognition and Machine Learning, Springer 
  1. https://www.arup.com/projects/met-office-high-performance-computer 
  1. J. L. Aznarte and N. Siebert, “Dynamic Line Rating Using Numerical Weather Predictions and Machine Learning: A Case Study,” in IEEE Transactions on Power Delivery, vol. 32, no. 1, pp. 335-343, Feb. 2017, doi: 10.1109/TPWRD.2016.2543818. 
  1. Foley, Aoife M et al. (2012). “Current methods and advances in forecasting of wind power generation”. In: Renewable Energy 37.1, pp. 1–8. 
  1. McGovern, Amy et al. (2017). “Using artificial intelligence to improve real-time decision making for high-impact weather”. In: Bulletin of the American Meteorological Society 98.10, pp. 2073–2090 
  1. O’Gorman, Paul A and John G Dwyer (2018). “Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change and extreme events”. In: arXiv preprint arXiv:1806.11037 
  1. Moghim, Sanaz and Rafael L Bras (2017). “Bias correction of climate modeled temperature and precipitation using artificial neural networks”. In: Journal of Hydrometeorology 18.7, pp. 1867–1884.  
  1. Camargo S J, Robertson A W Gaffney S J Smyth P and M Ghil (2007). “Cluster analysis of typhoon tracks. Part I: General properties”. In: Journal of Climate 20.14, pp. 3635–3653. 
  1. Ahijevych, David et al. (2009). “Application of spatial verification methods to idealized and NWP-gridded precipitation forecasts”. In: Weather and Forecasting 24.6, pp. 1485–1497. 
  1. Berner, Judith et al. (2017). “Stochastic parameterization: Toward a new view of weather and climate models”. In: Bulletin of the American Meteorological Society 98.3, pp. 565–588. 
  1. Fan, Wei and Albert Bifet (2013). “Mining big data: current status, and forecast to the future”. In: ACM sIGKDD Explorations Newsletter 14.2, pp. 1–5 
  1. Liu, Yunjie et al. (2016). “Application of deep convolutional neural networks for detecting extreme weather in climate datasets”. In: arXiv preprint arXiv:1605.01156. 
  1. Xingjian, SHI et al. (2015). “Convolutional LSTM network: A machine learning approach for precipitation nowcasting”. In: Advances in neural information processing systems, pp. 802–810. 

Support vector machine for classification of space weather

Carl Haines – carl.haines@pgr.reading.ac.uk

In a recent blog post, I discussed the use of the analogue ensemble (AnEn), or “similar-day” approach to forecasting geomagnetic activity. In this post I will look at the use of support vector machines (SVMs), a machine learning approach, to the same problem and compare the performance of the SVM to the AnEn. An implementation of the SVM has been developed for this project in python and is available at https://doi.org/10.5281/zenodo.4604485

Space weather encompasses a range of impacts on Earth caused by changes in the near-earth plasma due to variable activity at the Sun. These impacts include damage to satellites, interruptions to radio communications and damage to power grids. For this reason, it is useful to forecast the occurrence, intensity and duration of heightened geomagnetic activity, which we call geomagnetic storms.  

As in the previous post, the measure of geomagnetic activity used is the aaH index which has been developed by Mike Lockwood in Reading. The  aaH  index  gives a global measure of geomagnetic activity at a 3-hour resolution. In this index, the minimum value is zero and larger values represent more intense geomagnetic activity. 

The SVM is a commonly used classification algorithm which we implement here to classify whether a storm will occur. Given a sample of the input and the associated classification labels, the SVM will find a function that separates these input features by their class label. This is simple if the classes are linearly separable, as the function is a hyperplane. The samples lying closest to the hyperplane are called support vectors and the distance between these samples and the hyperplane is maximised. 

Figure 1 – A diagram explaining the kernel trick used by SVMs. This figure has been adopted from (Towards data science, n.d.) 

Typically, the samples are not linearly separable, so we employ Cover’s theorem which states that linearly inseparable classification problems are more likely to be linearly separable when cast non-linearly into a higher dimensional space. Therefore, we use a kenel trick to throw the inputs into a higher dimensional feature space, as depicted in Figure 1, to make it more separable. Further explanation is available in (Towards data science, n.d.) 

Based on aaH values in the 24-hour training window, the SVM predicts whether the next 3 hours will be either a storm or not. By comparing this dichotomous hindcast with the observed aaH, the outcome will be one of True Positive (TP, where a storm is correctly predicted), True Negative (TN, where no storm is correctly predicted), False Positive (FP, where a storm is predicted but not observed), or False Negative (FN, where a storm is not predicted but is observed). This is shown in the form of a contingency table in Figure 2. 

Figure 2 – Contingency table for the SVM classifying geomagnetic activity into the “no-storm” and “storm” classes. THe results have been normalised across the true label.

For development of the SVM, the aaH data has been separated into independent training and test intervals. These intervals are chosen to be alternate years. This is longer than the auto-correlation in the data (choosing, e.g. alternate 3-hourly data points, would not generate independent training and test data sets) but short enough that we assume there will not be significant aliasing with solar cycle variations. 

Training is an iterative process, whereby a cost function is minimised. The cost function is a combination of the relative proportion of TP, TN, FP and FN. Thus, while training itself, an SVM attempts to classify labelled data i.e., data belonging to a known category, in this case “storm” and “no storm” on the basis of the previous 24 hours of aaH. If the SVM makes an incorrect prediction it is penalised through a cost function which the SVM minimises.  The cost parameter determines the degree to which the SVM is penalised for a misclassification in training which allows for noise in the data.  

It is common that data with a class imbalance, that is containing many more samples from one class than the other, causes the classifier to be biased towards the majority class. In this case, there are far more non-storm intervals than storm intervals. Following (McGranaghan, Mannucci, Wilson, Mattman, & Chadwick, 2018), we define the cost of mis-classifying each class separately. This is done through the weight ratio Wstorm : Wno storm. Increasing the Wstorm increases the frequency at which the SVM predicts a storm and it follows that it predicts no storm at a reduced frequency. In this work we have varied Wstorm and kept Wno storm constant at 1. A user of the SVM method for forecasting may wish to tune the class weight ratio to give an appropriate ratio of false alarms and hit rate dependent on their needs. 

Different forecast applications will have different tolerances for false alarms and missed events.  To accommodate this, and as a further comparison of the hindcasts, we use a Cost/Loss analysis. In short, C is the economic cost associated with taking mitigating action when an event is predicted (whether or not it actually occurs) and L is the economic loss suffered due to damage if no mitigating action is taken when needed. For a deterministic method, such as the SVM, each time a storm is predicted will incur a cost C. Each time a storm is not predicted but a storm occurs a loss L is incurred. If no storm is predicted and no storm occurs, then no expense in incurred. By considering some time interval, the total expense can be computed by summing C and L. Further information, including the formular for computing the potential economic value (PEV) can be found in (Owens & Riley, 2017). 

A particular forecast application will have a C/L ratio in the domain (0,1). This is because a C/L of 0 would mean it is most cost effective to take constant mitigating action and a C/L of 1 or more means that mitigating action is never cost effective. In either case, no forecast would be helpful. The power of a Cost/Loss analysis is that it allows us to evaluate our methods for the entire range of potential forecast end users without specific knowledge of the forecast application requirements. End users can then easily interpret whether our methods fit their situation.  

Figure 3 – A cost loss analysis comparing the potential economic value of a range of SVMs to the AnEn. 

Figure 3 shows the potential economic value (PEV) of the SVM with a range of class weights (CW), probabilistic AnEn and 27-day recurrence. The shaded regions indicate which hindcast has the highest PEV for that Cost/Loss ratio. The probabilistic AnEn has the highest PEV for the majority of the Cost/Loss domain although SVM has higher PEV for lower Cost/Loss ratios. It highlights that the `best’ hindcast is dependent on the context in which it is to be employed. 

In summary, we have implemented an SVM for the classification of geomagnetic storms and compared the performance to that of the AnEn which was discussed in a previous blog post. The SVM and AnEn generally perform similarly in a Cost/Loss analysis and the best method will depend on the requirements of the end user. The code for the SVM is available at  https://doi.org/10.5281/zenodo.4604485 and the AnEn at https://doi.org/10.5281/zenodo.4604487

References 

Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data mining an knowledge discovery

McGranaghan, R., Mannucci, A., Wilson, B., Mattman, C., & Chadwick, R. (2018). New Capabilities for Prediction of High-Latitude Ionospheric Scin-669tillation: A Novel Approach With Machine Learning. Space Weather

Owens, M., & Riley, P. (2017). Probabilistic Solar Wind Forecasting Using Large Ensembles of Near-Sun Conditions With a Simple One-Dimensional “Upwind” Scheme. Space weather

Towards data science. (n.d.). Retrieved from https://towardsdatascience.com/the-kernel-trick-c98cdbcaeb3f