Some PhD projects are co-organised by an industrial CASE partner which provides supervisory support and additional funding. As part of my CASE partnership with the UK Met Office, in January I had the opportunity to spend 5 weeks at the Exeter HQ, which proved to be a fruitful experience. As three out of my four supervisors are based there, it was certainly a convenient set-up to seek their expertise on certain aspects of my PhD project!
One part of my project aims to understand how certain neighbourhood-based verification methods can affect the level of surface air quality forecast accuracy. Routine verification of a forecast model against observations is necessary to provide the most accurate forecast possible. Ensuring that this happens is crucial, as a good forecast may help keep the public aware of potential adverse health risks resulting from elevated pollutant concentrations.
The project deals with two sides of one coin: evaluating forecasts of regional surface pollutant concentrations; and evaluating those of meteorological fields such as wind speed, precipitation, relative humidity or temperature. All of the above have unique characteristics: they vary in resolution, spatial scale, homogeneity, randomness… The behaviour of the weather and pollutant variables is also tricky to compare against one another because the locations of their numerous measurement sites nearly never coincide, whereas the forecast encompasses the entirety of the domain space. This is kind of the crux of this part of my PhD: how can we use these irregularly located measurements to our advantage in verifying the skill of the forecast in the most useful way? And – zooming out still – can we determine the extent to which the surface air pollution forecast is dependent on some of those aforementioned weather variables? And can this knowledge (once acquired!) be used to further improve the pollution forecast?
While at the Met Office, I began my research specifically into methods which analyse the forecast skill when a model “neighbourhood” of a particular size around a particular point-observation is evaluated. These methods are being developed as part of a toolkit for evaluation of high resolution forecasts, which can be (and usually are) more accurate than a lower resolution equivalent, but traditional metrics (e.g. root mean square error (RMSE) or mean error (ME)) often fail to demonstrate the improvement (Mittermaier, 2014). They can also fall victim to various verification errors such as the double-penalty problem. This is when an ‘event’ might have been missed at a particular time in the forecast at one gridpoint because it was actually forecast in the neighbouring grid-point one time-step out, so the RMSE counts this error both in the spatial and temporal axes. Not fair, if you ask me. So as NWP continues to increase in resolution, there is a need for robust verification methods which relax the spatial (or temporal) restriction on precise forecast-to-observation matching somewhat (Ebert, 2008).
One way to proceed forward is via a ‘neighbourhood’ approach which treats a deterministic forecast almost as an ensemble by considering all the grid-points around an observation as an individual forecast and formulating a probabilistic score. Neighbourhoods are made of varying number of model grid-points, i.e. a 3×3 or a 5×5 or even bigger. A skill score such as the ranked probability score (RPS) or Brier Score is calculated using the cumulative probability distribution across the neighbourhood of the exceedance of a sensible pollutant concentration threshold. So, for example, we can ask what proportion of a 5×5 neighbourhood around an observation has correctly forecasted an observed exceedance (i.e. ‘hit’)? What if an exceedance forecast has been made, but the observed quantity didn’t reach that magnitude (i.e. ‘false alarm’)? And how do these scores change when larger (or smaller) neighbourhoods are considered? And, if these spatial verification methods prove informative, how could they be implemented in operational air quality forecast verification? All these questions will hopefully have some answers in the near future and form a part of my PhD thesis!
Although these kind of methods have been used for meteorological variables, they haven’t yet been widely researched in the context of regional air quality forecasts. The verification framework for this is called HiRA – High Resolution Assessment, which is part of the wider verification network Model Evaluation Tools (which, considering it is being developed as a means of uniformly assessing high-resolution meteorological forecasts, has the most unhelpful acronym: MET). It is quite an exciting opportunity to be involved in the testing and evaluation of this new set of verification tools for a surface pollution forecast at a regional scale, and I am very grateful to be involved in this. Also, having the opportunity to work at the Met Office and “pretend” to be a real research scientist for a while is awesome!