The Social Metwork

Forecast Verification Summer School

July 9, 2021July 9, 2021lilygreigLeave a comment

Lily Greig – l.greig@pgr.reading.ac.uk

A week-long summer school on forecast verification was held jointly at the end of June by the MPECDT (Mathematics of Planet Earth Centre for Doctoral Training) and JWGFVR (Joint Working Group on Forecast Verification Research). The school featured lectures from scientists and academics from many different countries around the world including Brazil, USA and Canada. They each specialised in different topics within forecast verification. Participants gained a large overview of the field and how the fields within it interact.

Structure of school

The virtual school consisted of lectures from individual members of the JWGFVR on their own subjects, along with drop-in sessions for asking questions and dedicated time to work on group projects. Four groups of 4-5 students were given an individual forecast verification challenge. The themes of the projects were precipitation forecasts, comparing high resolution global model and local area model wind speed forecasts, and ensemble seasonal forecasts. The latter was the topic of our project.

Content

The first lecture was given by Barbara Brown, who provided a broad summary of verification and gave examples of questions that verifiers may ask themselves as they attempt to assess the “goodness” of a forecast. The next day, a lecture by Barbara Casati covered continuous scores (verification of continuous variables e.g., temperature), such as linear bias, mean-squared error (MSE) and Pearson coefficient. She also outlined the deficits of different scores and how it is best to use a variety of them when assessing the quality of a forecast. Marion Mittermaier then spoke about categorical scores (yes/no events or multi category events such as precipitation type). She gave examples such as contingency tables which portray how well a model is able to predict a given event, based on hit rates (how often the model predicted an event when the event happened), and false alarm rates (how often the model predicted the event when it didn’t happen). Further lectures were given by Ian Joliffe on methods of determining the significance of your forecast scores, Nachiketa Acharya on probabilistic scores and ensembles, Caio Coelho on sub-seasonal to seasonal timescales, and then Raghavendra Ashrit, Eric Gilleland and Caren Marzban on severe weather, spatial verification and experimental design. The lectures have been made available online and you can find them here.

Forecast Verification

So, forecast verification is as it sounds: a part of assessing the ‘goodness’ of a forecast as opposed to its value. Verification is helpful for economic purposes (e.g. decision making), as well as administrative and scientific ones (e.g. identifying model flaws). The other aspect of measuring how well a forecast is performing is knowing the user’s needs, and therefore how to apply the forecast. It is important to consider the goal of your verification process beforehand, as it will outline your choice of metrics and your assessment of them. An example of how forecast goodness hinges on the user was given by Barbara in her talk: a precipitation forecast may have a spatial offset of where a rain patch falls, but if both observation and forecast fall along the flight path, this may be all the aviation traffic strategic planner needs to know. For a watershed manager on the ground, however, this would not be a helpful forecast. The lecturers also emphasised the importance of performing many different measures on a forecast and then understanding the significance of your measures in order to help you understand its overall goodness. Identifying standards of comparison for your forecast is also important, such as persistence or climatology. Then there are further challenges such as spatial verification, which requires methods of ‘matching’ the location of your observations with the model predictions on the model grid.

Figure 1: Problem statement for group presentation on 2m temperature ensemble seasonal forecasts, presented by Ryo Kurashina

Group Project

Our project was on verification of 2 metre temperature ensemble seasonal forecasts (see Figure 1). We were looking at seasonal forecast data with a 1-month lead time for the summer months for three different models and investigating ways of validating the forecasts, finally deciding which one was the better. We decided to focus on the models’ ability to predict hot and cold events as a simple metric for El Nino. We looked at scatter plots and rank histograms to investigate the biases in our data, Brier scores for assessing model accuracy (level of agreement between forecast and truth) and Receiver Operating Characteristic curves to look model skill (the relative accuracy of the forecast over some reference forecast). The ROC curve (see Fig. 2) refers to the curve formed by plotting hit rates against false alarm rates based on probability thresholds. The further above the diagonal line your curve lies, the better your forecast is at discriminating events compared to a random coin toss. The combination of these verification methods were used to assess which model we thought was best.

Of course, virtual summer schools are less than ideal compared to the real (in person) deal, but with Teams meetings, shared code and chat channel we made the most of it. It was fun to work with everyone, even (or especially?) if the topic was new for all of us.

Figure 2: Presenting our project during group project presentations on Friday

Conclusions

The summer school was incredibly smoothly run, very engaging to people both new and experienced in the topic and provided plenty of opportunity to ask questions to the enthusiastic lecturers. Would recommend to PhD students working with forecasts and wanting to assess them!

The effect of surface heat fluxes on the evolution of storms in the North Atlantic storm track

July 2, 2021July 2, 2021andreamarcheggianiLeave a comment

Andrea Marcheggiani – a.marcheggiani@pgr.reading.ac.uk

Diabatic processes are typically considered as a source of energy for weather systems and as a primary contributing factor to the maintenance of mid-latitude storm tracks (see Hoskins and Valdes 1990 for some classical reading, but also a more recent reviews, e.g. Chang et al. 2002). However, surface heat exchanges do not necessarily act as a fuel for the evolution of weather systems: the effects of surface heat fluxes and their coupling with lower-tropospheric flow can be detrimental to the potential energy available for systems to grow. Indeed, the magnitude and sign of their effects depend on the different time (e.g., synoptic, seasonal) and length (e.g., global, zonal, local) scales which these effects unfold at.

**Figure 1:** Composites for strong (a-c) and weak (d-f) values of the covariance between heat flux and temperature time anomalies.

Heat fluxes arise in response to thermal imbalances which they attempt to neutralise. In the atmosphere, the primary thermal imbalances that are observed correspond with the meridional temperature gradient caused by the equator—poles differential radiative heating from the Sun, and the temperature contrasts at the air—sea interface which essentially derives from the different heat capacities of the oceans and the atmosphere.

In the context of the energetic scheme of the atmosphere, which was first formulated by Lorenz (1955) and commonly known as Lorenz energy cycle, the meridional transport of heat (or dry static energy) is associated with conversion of zonal available potential energy to eddy available potential energy, while diabatic processes at the surface coincide with generation of eddy available potential energy.

**Figure 2:** Phase portrait of FT covariance and mean baroclinicity. Streamlines indicate average circulation in the phase space (line thickness proportional to phase speed). The black shaded dot in the top left corner indicates the size of the Gaussian kernel used in the smoothing process. Colour shading indicates the number of data points contributing to the kernel average

The sign of the contribution from surface heat exchanges to the evolution on weather systems is not univocal, as it depends on the specific framework which is used to evaluate their effects. Globally, these have been estimated to have a positive effect on the potential energy budget (Peixoto and Oort, 1992) while locally the picture is less clear, as heating where it is cold and cooling where it is warm would lead to a reduction in temperature variance, which is essentially available potential energy.

The first part of my PhD focussed on assessing the role of local air—sea heat exchanges on the evolution of synoptic systems. To that extent, we built a hybrid framework where the spatial covariance between time anomalies of sensible heat flux F and lower-tropospheric air temperature T is taken as a measure of the intensity of the air—sea thermal coupling. The time anomalies, denoted by a prime, are defined as departures from a 10-day running mean so that we can concentrate on synoptic variability (Athanasiadis and Ambaum, 2009). The spatial domain where we compute the spatial covariance extends from 30°N to 60°N and from 30°W to 79.5°W, which corresponds with the Gulf Stream extension region, and to focus on air—sea interaction, we excluded grid points covered by land or ice.

This leaves us with a time series for F’—T’ spatial covariance, which we also refer to as FT index.

The FT index is found to be always positive and characterised by frequent bursts of intense activity (or peaks). Composite analysis, shown in Figure 1 for mean sea level pressure (a,d), temperature at 850hPa (b,e) and surface sensible heat flux (c,f), indicates that peaks of the FT index (panels a—c) correspond with intense weather activity in the spatial domain considered (dashed box in Figure 1) while a more settled weather pattern is observed to be typical when the FT index is weak (panels d—f).

**Figure 3:** Phase portraits for spatial-mean T (a) and cold sector area fraction (b). Shading in (a) represents the difference between phase tendency and the mean value of T, as reported next to the colour bar. Arrows highlight the direction of the circulation, kernel-averaged using the Gaussian kernel shown in the top-left corner of each panel.

We examine the dynamical relationship between the FT index and the area-mean baroclinicity, which is a measure of available potential energy in the spatial domain. To do that, we construct a phase space of FT index and baroclinicity and study the average circulation traced by the time series for the two dynamical variables. The resulting phase portrait is shown in Figure 2. For technical details on phase space analysis refer to Novak et al. (2017), while for more examples of its use see Marcheggiani and Ambaum (2020) or Yano et al. (2020). We observe that, on average, baroclinicity is strongly depleted during events of strong F’—T’ covariance and it recovers primarily when covariance is weak. This points to the idea that events of strong thermal coupling between the surface and the lower troposphere are on average associated with a reduction in baroclinicity, thus acting as a sink of energy in the evolution of storms and, more generally, storm tracks.

Upon investigation of the driving mechanisms that lead to a strong F’—T’ spatial covariance, we find that increases in variances and correlation are equally important and that appears to be a more general feature of heat fluxes in the atmosphere, as more recent results appear to indicate (which is the focus of the second part of my PhD).

In the case of surface heat fluxes, cold sector dynamics play a fundamental role in driving the increase of correlation: when cold air is advected over the ocean surface, flux variance amplifies in response to the stark temperature contrasts at the air—sea interface as the ocean surface temperature field features a higher degree of spatial variability linked to the presence of both the Gulf Stream on the large scale and oceanic eddies on the mesoscale (up to 100 km).

The growing relative importance of the cold sector in the intensification phase of the F’—T’ spatial covariance can also be revealed by looking at the phase portraits for air temperature and cold sector area fraction, which is shown in Figure 3. These phase portraits tell us how these fields vary at different points in the phase space of surface heat flux and air temperature spatial standard deviations (which correspond to the horizontal and vertical axes, respectively). Lower temperatures and larger cold sector area fraction characterise the increase in covariance, while the opposite trend is observed in the decaying stage.

Surface heat fluxes eventually trigger an increase in temperature variance, which within the atmospheric boundary layer follows an almost adiabatic vertical profile which is characteristic of the mixed layer (Stull, 2012).

**Figure 4:** Diagram of the effect of the atmospheric boundary layer height on modulating surface heat flux—temperature correlation.

Stronger surface heat fluxes are associated with a deeper boundary layer reaching higher levels into the troposphere: this could explain the observed increase in correlation as the lower-tropospheric air temperatures become more strongly coupled with the surface, while a lower correlation with the surface ensues when the boundary layer is shallow and surface heat flux are weak. Figure 4 shows a simple diagram summarising the mechanisms described above.

In conclusion, we showed that surface heat fluxes locally can have a damping effect on the evolution of mid-latitude weather systems, as the covariation of surface heat flux and air temperature in the lower troposphere corresponds with a decrease in the available potential energy.

Results indicate that most of this thermodynamically active heat exchange is realised within the cold sector of weather systems, specifically as the atmospheric boundary layer deepens and exerts a deeper influence upon the tropospheric circulation.

References

Athanasiadis, P. J. and Ambaum, M. H. P.: Linear Contributions of Different Time Scales to Teleconnectivity, J. Climate, 22, 3720– 3728, 2009.
Chang, E. K., Lee, S., and Swanson, K. L.: Storm track dynamics, J. Climate, 15, 2163–2183, 2002.
Hoskins, B. J. and Valdes, P. J.: On the existence of storm-tracks, J. Atmos. Sci., 47, 1854–1864, 1990.
Lorenz, E. N.: Available potential energy and the maintenance of the general circulation, Tellus, 7, 157–167, 1955.
Marcheggiani, A. and Ambaum, M. H. P.: The role of heat-flux–temperature covariance in the evolution of weather systems, Weather and Climate Dynamics, 1, 701–713, 2020.
Novak, L., Ambaum, M. H. P., and Tailleux, R.: Marginal stability and predator–prey behaviour within storm tracks, Q. J. Roy. Meteorol. Soc., 143, 1421–1433, 2017.
Peixoto, J. P. and Oort, A. H.: Physics of climate, American Institute of Physics, New York, NY, USA, 1992.
Stull, R. B.: Mean boundary layer characteristics, In: An Introduction to Boundary Layer Meteorology, Springer, Dordrecht, Germany, 1–27, 1988.
Yano, J., Ambaum, M. H. P., Dacre, H., and Manzato, A.: A dynamical—system description of precipitation over the tropics and the midlatitudes, Tellus A: Dynamic Meteorology and Oceanography, 72, 1–17, 2020.

CMIP6 Data Hackathon

June 18, 2021June 18, 2021Chloe BrimicombeLeave a comment

Brian Lo – brian.lo@pgr.reading.ac.uk

Chloe Brimicombe – c.r.brimicombe@pgr.reading.ac.uk

What is it?

A hackathon, from the words hack (meaning exploratory programming, not the alternate meaning of breaching computer security) and marathon, is usually a sprint-like event where programmers collaborate intensively with the goal of creating functioning software by the end of the event. From 2 to 4 June 2021, more than a hundred early career climate scientists and enthusiasts (mostly PhDs and Postdocs) from UK universities took part in a climate hackathon organised jointly by Universities of Bristol, Exeter and Leeds, and the Met Office. The common goal was to quickly analyse certain aspects of Climate Model Intercomparison Project 6 (CMIP6) data to output cutting-edge research that could be worked into a published material and shown in this year’s COP26.

Before the event, attendees signed up to their preferred project from a choice of ten. Topics ranged from how climate change will affect migration of arctic terns to the effects of geoengineering by stratospheric sulfate injections and more… Senior academics from a range of disciplines and institutions led each project.

Group photo of participants at the CMIP6 Data Hackathon

How is this virtual hackathon different to a usual hackathon?

Like many other events this year, the hackathon took place virtually, using a combination of video conferencing (Zoom) for seminars and teamwork, and discussion forums (Slack).

Brian:

Compared to two 24-hour non-climate related hackathons I previously attended, this one was spread out for three days, so I managed not to disrupt my usual sleep schedules! The experience of pair programming with one or two other team members was as easy, since I shared one of my screens on Zoom breakout rooms throughout the event. What I really missed were the free meals, plenty of snacks and drinks usually on offer at normal hackathons to keep me energised while I programmed.

Chloe:

I’ve been to a climate campaign hackathon before, and I did a hackathon style event to end a group project during the computer science part of my undergraduate; we made the boardgame buccaneer in java. But this was set out completely differently. And, it was not as time intensive as those which was nice. I missed not being in a room with those you are on a project with and still missing out on free food – hopefully not for too much longer. But we made use of Zoom and Slack for communication. And Jasmin and the version control that git offers with individuals working on branches and then these were merged at the end of the hackathon.

What did we do?

Brian:

Project 2: How well do the CMIP6 models represent the tropical rainfall belt over Africa?

Using Gaussian parameters in Nikulin & Hewitson 2019 to describe the intensity, mean meridional position and width of the tropical rainfall belt (TRB), the team I was in investigated three aspects of CMIP6 models for capturing the Africa TRB, namely the model biases, projections and whether there was any useful forecast information in CMIP6 decadal hindcasts. These retrospective forecasts were generated under the Decadal Climate Prediction Project (DCPP), with an aim of investigating the skill of CMIP models in predicting climate variations from a year to a decade ahead. Our larger group of around ten split ourselves amongst these three key aspects. I focused on aspect of CMIP6 decadal hindcasts, where I compared different decadal models at different model lead times with three observation sources.

Chloe:

Project 10: Human heat stress in a warming world

Our team leader Chris had calculated the universal thermal climate index (UTCI) – a heat stress index for a bunch of the CMIP6 climate models. He was looking into bias correction against the ERA5 HEAT reanalysis dataset whilst we split into smaller groups. We looked at a range of different things from how the intensity of heat stress changed to how the UTCI compared to mortality. I ended up coding with one of my (5) PhD supervisors Claudia Di Napoli and we made amongst other things the gif below.

Having a great final day at the #cmip6hackathon.

See the Annual Means of the UTCI for RCP4.5 (medium emissions) projection from 2020 to 2099. Notice Brazil having a particularly tough time with heat stress 🥵🌿.

Joint effort from me and @DrClaudiaDN for this plot.#HeatHealth pic.twitter.com/KipiaDy8HG
— Dr Chloe Brimicombe (@ChloBrim) June 4, 2021

https://twitter.com/ChloBrim/status/1400780543193649153

Annual means of the UTCI for RCP4.5 (medium emissions) projection from 2020 to 2099.

Would we recommend meteorology/climate-related hackathon?

Brian:

Yes! The three days was a nice break from my own radar research work. The event was nevertheless good training for thinking quickly and creatively to approach research questions other than those in my own PhD project. The experience also sharpened my coding and data exploration skills, while also getting the chance to quickly learn advanced methods for certain software packages (such as xarray and iris). I was amazed at the amount of scientific output achieved in only three short days!

Chloe:

Yes, but also make sure if it’s online you block out the time and dedicate all your focus to the hackathon. Don’t be like me. The hackathon taught me more about python handling of netcdfs, but I am not yet a python plotting convert, there are some things R is just nicer for. And I still love researching heat stress and heatwaves, so that’s good!

We hope that the CMIP hackathon runs again next year to give more people the opportunity to get involved.

How to write a PhD thesis during a global pandemic

June 4, 2021June 4, 2021KajaLeave a comment

Kaja Milczewska – k.m.milczewska@pgr.reading.ac.uk

Completing a PhD is a momentous task at the best of times, let alone in combination with a year-long global pandemic. Every PhD researcher is different, and as such, everyone has had different circumstantial struggles throughout Covid-19. The lack of human interaction that comes with working in a vibrant academic environment such as the Meteorology Department can make working from home a real struggle. Sometimes it is difficult to find the motivation to get anything useful done; whereas at other times you could squeeze five hours’ worth of work into one. Trying to stay organised is key to getting it done, therefore the following are some of the things that helped me get to the end of my PhD thesis – and it has not been easy. If you are still out there writing and finishing up experiments: read on! Maybe the result is that you might feel a little less alone. The PhD experience can be truly isolating at the best of times, so literally being instructed to isolate from the world is not ideal. The points are numbered for convenience of structuring this post, rather than any order of importance.

Communicate with your supervisor(s)

It is tempting to “disappear off the radar” when things are not going well. You could wake up in the morning of the day of your regular weekly meeting, filled with dread that you have not prepared anything for it. Your brain recoils into the depths of your skull as your body recoils back under the safety of the duvet. What are your options? Some of them might be: take a deep gulp and force yourself out of bed with the prospect of coffee before the meeting (where you steer the conversation onto the things you did manage to do); or to postpone the meeting because you need to finish XYZ and thus a later meeting may be more productive; or ignore the meeting altogether. The first one is probably the best option, but it requires mental strength where there might be none. The second one is OK, but you still need to do the work. The last one is a big no. Don’t do it.

Anxiety will make you believe that ignoring the world and all responsibilities is the most comfortable option in the moment, but the consequences of acting on it could be worse. Supervisors value honesty, and they know well that it is not always possible to complete all the scheduled tasks. Of course, if this happens every week then you might need to introspectively address the reasons for this, and – again, talking with your supervisor is usually a useful thing to do. You might not want them to know your entire life story, but it is helpful for everybody involved if they are aware that you struggle with anxiety / depression / ADHD / *insert any condition here*, which could affect your capacity to complete even the simplest, daily tasks. Being on the same page and having matching expectations is key to any student – supervisor partnership.

Reward yourself for the things you have already accomplished.

Whether that’s mid-week, mid-to-do-list, weekend — whenever. List all the things you have done regularly (either work- or life-related) and recognise that you are trying to survive a pandemic. And trying to complete the monstrous task of writing a PhD thesis. Those are big asks, and the only way to get through them is to break them down into smaller chunks. Putting down “Write thesis” on your to-do list is more likely to intimidate than motivate you. How about breaking it down further: “Re-create plot 4.21”, or “Consolidate supervisor comments on pages 21 – 25” — these are achievable things in a specified length of time. It also means you could tick them off more easily, hopefully resulting in feeling accomplished. Each time this happens, reward yourself in whatever way makes you feel nice. Even just giving yourself a literal pat on the shoulder could feel great – try it!

Compile supervisor feedback / comments into a spreadsheet

An Excel spreadsheet – or any other suitable system – will enable you to keep track of what still needs addressing and what has been completed. The beauty of using a colour-coded spreadsheet for feedback comments is that once the required corrections are completed, you have concrete evidence of how much you have already achieved – something to consult if you start feeling inadequate at any point (see previous section!). I found this a much easier system than writing it down in my workbook, although of course this does work for some people, too. Anytime you receive feedback on your work – written or otherwise – note them down. I used brief reminders, such as “See supervisor’s comment on page X” but it was useful to have them all compiled together. Also, I found it useful to classify the comments into ‘writing-type’ corrections and ‘more work required’ corrections. The first one is self-explanatory: these were typos, wrong terminologies, mistakes in equations and minor structural changes. The ‘more work required’ was anything that required me to find citations / literature, major structural changes, issues with my scientific arguments or anything else that required more thought. This meant that if my motivation was lacking, I could turn to the “writing-type” comments and work on them without needing too much brain power. It also meant that I could prioritise the major comments first, which made working to a deadline a little bit easier.

Break down how long specific things will take

This is most useful when you are a few weeks away from submission date. With only 5 weeks left, my colour-coded charts were full of outstanding comments; neither my ‘Conclusions’ chapter nor my Abstract had been written; plots needed re-plotting and I still did not know the title of my thesis. Naturally, I was panicking. I knew that the only way I could get through this was to set a schedule — and stick to it. At the time, there were 5 major things to do: complete a final version of each of my 5 thesis chapters. A natural split was to allow each chapter only one week for completion. If I was near to running over my self-prescribed deadline, I would prioritise only the major corrections. If still not done by the end of the allowed week: that’s it! Move on. This can be difficult for any perfectionists out there, but by this point the PhD has definitely taught me that “done” is better than perfect. I also found that some chapters took less time to finish than others, so I had time to return to the things I left not quite finished. Trust yourself, and give it your best. By all means, push through the hardest bit to the end, but remember that there (probably) does not exist a single PhD thesis without any mistakes.

5. Follow useful Twitter threads

There exist two groups of people: those who turn off or deactivate all social media when they need to focus on a deadline, and those who get even more absorbed by its ability to divert your attention away from the discomfort of the dreaded task at hand. Some might call it “productive procrastination”. I actually found that social media helped me a little – but only when my state of mind was such that I could resist the urge to fall down a scrolling rabbit hole. If you are on Twitter, you might find hashtags like #phdchat and accounts such as @AcademicChatter , @phdforum @phdvoice useful.

6. Join a virtual “writing room”

On the back of the last tip, I have found a virtual writing room helpful for focus. The idea is that you join an organised Zoom meeting full of other PhDs, all of whom are writing at the same time. All microphones are muted, but the chat is active so it is nice to say ‘hello!’ to someone else writing at the same time, anywhere else in the world. The meetings have scheduled breaks, with the organiser announcing when they occur. I found that because I actively chose to be up and start writing at the very early hour of 6am by attending the virtual writing room, I was not going to allow myself to procrastinate. The commitment to being up so early and being in a room full of people also doing the same thing (but virtually, obviously) meant that those were the times that I was probably the most focused. These kinds of rooms are often hosted by @PhDForum on Twitter; there could also be others. An alternative idea could be to set up a “writing meeting” with your group of peers and agree to keep chatter to a minimum (although this is not something I tried myself).

7. Don’t look at the news

Or at least, minimise your exposure to them. It is generally a good thing to stay on top of current events, but the final stages of writing a PhD thesis are probably unlike any other time in your life. You need the space and energy to think deeply about your own work right now. Unfortunately, I learnt this the hard way and found that there were days where I could do very little work because my brain was preoccupied with awful events happening around the world. It made me feel pathetic, routinely resulting in staying up late to try and finish whatever I failed to finish during the day. This only deteriorated my wellbeing further with shortened sleep and a constant sense of playing “catch-up”. If this sounds like you, then try switching off your news notifications on your phone or computer, or limit yourself to only checking the news homepage once a day at a designated time.

8. Be honest when asked about how you are feeling

Many of us tend to downplay or dismiss our emotions. It can be appealing to keep your feelings to yourself, saving yourself the energy involved in explaining the situation to whomever asked. You might also think that you are saving someone else the hassle of worrying about you. The trouble is that if we continuously paper over the cracks in our mental wellbeing within the handful of conversations we are having (which are especially limited during the pandemic), we could stop acknowledging how we truly feel. This does not necessarily mean spilling all the beans to whomever asked the innocent question, “How are you?”. But the catharsis from opening up to someone and acknowledging that things are not quite right could really offload some weight off your shoulders. If the person on the other end is your PhD supervisor, it can also be helpful for them to know that you are having a terrible time and are therefore unable to complete tasks to your best ability. Submission anxiety can be crippling for some people in the final few weeks, and your supervisor just won’t be able to (and shouldn’t) blindly assume how your mental health is being affected by it, because everyone experiences things differently. This goes back to bullet no.1.

Hopefully it goes without saying that the above are simply some things that helped me through to the end of the thesis, but everybody is different. I am no counsellor or wellbeing guru; just a recently-finished PhD! Hopefully the above points might offer a little bit of light for anyone else struggling through the storm of that final write-up. Keep your chin up and, as Dory says: just keep swimming. Good luck!

Better Data… with MetaData!

May 21, 2021May 21, 2021James Fallon1 Comment

James Fallon – j.fallon@pgr.reading.ac.uk

As researchers, we familiarise ourselves with many different datasets. Depending on who put together the dataset, the variable names and definitions that we are already familiar from one dataset may be different in another. These differences can range from subtle annoyances to large structural differences, and it’s not always immediately obvious how best to handle them.

One dataset might be on an hourly time-index, and the other daily. The grid points which tell us the geographic location of data points may be spaced at different intervals, or use entirely different co-ordinate systems!

However most modern datasets come with hidden help in the form of metadata – this information should tell us how the data is to be used, and with the right choice of python modules we can use the metadata to automatically work with different datasets whilst avoiding conversion headaches.

First attempt…

Starting my PhD, my favourite (naïve, inefficient, bug prone,… ) method of reading data with python was with use of the built-in function open() or numpy functions like genfromtxt(). These are quick to set up, and can be good enough. But as soon as we are using data with more than one field, complex coordinates and calendar indexes, or more than one dataset, this line of programming becomes unwieldy and disorderly!

>>> header = np.genfromtxt(fname, delimiter=',', dtype='str', max_rows=1)
>>> print(header)
['Year' 'Month' 'Day' 'Electricity_Demand']
>>> data = np.genfromtxt(fnam, delimiter=',', skip_header=1)
>>> print(data)
array([[2.010e+03, 1.000e+00, 1.000e+00, 0.000e+00],
       [2.010e+03, 1.000e+00, 2.000e+00, 0.000e+00],
       [2.010e+03, 1.000e+00, 3.000e+00, 0.000e+00],
       ...,
       [2.015e+03, 1.200e+01, 2.900e+01, 5.850e+00],
       [2.015e+03, 1.200e+01, 3.000e+01, 6.090e+00],
       [2.015e+03, 1.200e+01, 3.100e+01, 6.040e+00]])

The above code reads in year, month, day data in the first 3 columns, and Electricity_Demand in the last column.

You might be familiar with such a workflow – perhaps you have refined it down to a fine art!

In many cases this is sufficient for what we need, but making use of already available metadata can make the data more readable, and easier to operate on when it comes to complicated collocation and statistics.

Enter pandas!

Pandas

In the previous example, we read in our data to numpy arrays. Numpy arrays are very useful, because they store data more efficiently than a regular python list, they are easier to index, and have many built in operations from simple addition to niche linear algebra techniques.

We stored column labels in an array called header, but this means our metadata has to be handled separately from our data. The dates are stored in three different columns alongside the data – but what if we want to perform an operation on just the data (for example add 5 to every value). It is technically possible but awkward and dangerous – if the column index changes in future our code might break! We are probably better splitting the dates into another separate array, but that means more work to record the column headers, and an increasing number of python variables to keep track of.

Using pandas, we can store all of this information in a single object, and using relevant datatypes:

>>> data = pd.read_csv(fname, parse_dates=[['Year', 'Month', 'Day']], index_col=0)
>>> data
Electricity_Demand
Year_Month_Day      
2010-01-01      0.00
2010-01-02      0.00
2010-01-03      0.00
2010-01-04      0.00
2010-01-05      0.00
...              ...
2015-12-27      5.70
2015-12-28      5.65
2015-12-29      5.85
2015-12-30      6.09
2015-12-31      6.04

[2191 rows x 1 columns]

This may not immediately appear a whole lot different to what we had earlier, but notice the dates are now saved in datetime format, whilst being tied to the data Electricity_Demand. If we want to index the data, we can simultaneously index the time-index without any further code (and possible mistakes leading to errors).

Pandas also makes it really simple to perform some complicated operations. In this example, I am only dealing with one field (Electricity_Demand), but this works with 10, 100, 1000 or more columns!

Flip columns with data.T
Calculate quantiles with data.quantile
Cut to between dates, eg. data.loc['2010-02-03':'2011-01-05']
Calculate 7-day rolling mean: data.rolling(7).mean()

We can insert new columns, remove old ones, change the index, perform complex slices, and all the metadata stays stuck to our data!

Whilst pandas does have many maths functions built in, if need-be we can also export directly to numpy using numpy.array(data['Electricity_Demand']) or data.to_numpy().

Pandas can also simplify plotting – particularly convenient when you just want to quickly visualise data without writing import matplotlib.pyplot as plt and other boilerplate code. In this example, I plot my data alongside its 7-day rolling mean:

ax = data.loc['2010'].plot(label='Demand', ylabel='Demand (GW)')
data.loc['2010'].rolling(7).mean().plot(ax=ax, label='Demand rolling mean')
ax.legend()

Now I can visualise the anomalous values at the start of the dataset, a consistent annual trend, a diurnal cycle, and fairly consistent behaviour week to week.

Big datasets

Pandas can read from and write to many different data formats – CSV, HTML, EXCEL, … but some filetypes like netCDF4 that meteorologists like working with aren’t built in.

xarray is an extremely versatile tool that can read in many formats including netCDF, GRIB. As well as having built in functions to export to pandas, xarray is completely capable of handling metadata on its own, and many researchers work directly with objects such as xarray DataArray objects.

There are more xarray features than stars in the universe^{[citation needed]}, but some that I find invaluable include:

open_mfdataset – automatically merge multiple files (eg. for different dates or locations)
assign_coords – replace one co-ordinate system with another
where – replace xarray values depending on a condition

Yes you can do all of this with pandas or numpy. But you can pass metadata attributes as arguments, for example we can get the latitude average with my_data.mean('latitude'). No need to work in indexes and hardcoded values – xarray can do all the heavy lifting for you!

Have more useful tips for working effectively with meteorological data? Leave a comment here or send me an email j.fallon@pgr.reading.ac.uk 🙂

The EGU Experience 2021: a PhD student perspective

May 14, 2021May 14, 2021Chloe BrimicombeLeave a comment

Max Coleman – m.r.coleman@pgr.reading.ac.uk

Chloe Brimicombe – c.r.brimicombe@pgr.reading.ac.uk

The European Geoscience Union General Assembly is one of the big annual conferences for atmospheric science (and Earth sciences more generally). The two of us were fortunate to have the opportunity to attend and present our research at this year’s vEGU21 conference. As has been done in previous years like in 2019 we’re here to give you an account of our EGU experience 😀 (so you can compare our virtual experience with the previous posts if you like 😉)

Entrance hall to virtual EGU (Source: Linda Speight)

What was vEGU21?

EGUv21 was the general assembly for 2021 online. It took place from the 19^th to the 30^th April EGU. Through an impressive virtual conference center and mostly Zoom.

What was your presentation on?

Chloe – I presented borderless heat stress in the extreme heat events session, which is based on a paper currently under review at Earth’s Future, where we show that heat stress is growing in the area during the month of August. The invited speaker to the session was Laura Suarez-Gutierrez and it was a great presentation on the dynamics of increasing heat extremes with climate change across Europe. I really enjoyed learning about the latest research in the extreme heat area.

Max – I presented on my work using model nudging to study aerosol radiative adjustments. I presented in the session ‘Chemistry, Aerosols and Radiative Forcing in CMIP6-era models’, which was convened and hosted by Reading’s very own Bill Collins. There were many interesting presentations in this session, including presentations on the balance between climate and air quality benefits by Robert Allen and Steve Turnock; a summary of the Aerosol Chemistry Model Intercomparison Project (AerChemMIP) findings by UoR’s Gill Thornhill; and a personal favourite concerned the impacts of different emissions pathways in Africa on local and global climate, and local air pollution effects on mortality, presented by Chris Wells.

Chloe presenting: would win an award for most interesting screenshot. (Source: Maureen Wanzala)

What were your favourite aspects of the conference?

Chloe – Apart from my session one of my favorite’s was on climate services. This focused on the application of meteorological and hydrology data to services for example health heat impacts and growing grapes and olives. I also enjoyed the panel on the climate and ecological emergency in light of COVID-19 including Katherine Hayhoe and the session on equality, diversity and inclusion; it was interesting how ‘listening’ to those impacted was an overlapping theme in these. The weirdest, loveliest experience was my main supervisor sending me a colouring page of her face.

Max – As with any conference it was a great opportunity to learn about the latest research in my specific field, as well as learning about exciting developments in other fields, from machine learning applications in earth science to observational studies of methane emissions. Particularly, it’s a nice change from just reading about them in papers.Having conversations with presenters gives you the opportunity to really dive in and find out what motivated their research initially and discuss future applications. For example, one conversation I had went from discussing their application of unsupervised machine learning in classifying profiles of earth system model output, to learning about it’s potential for use in model intercomparisons.

Katherine Hayhoe in the session Climate and Ecological Emergency: can a pandemic help save us? (Source: Chloe Brimicombe)

What was your least favourite aspect?

Chloe – I did manage to do a little networking. But I’d love to experience an in person conference where I present. I have never presented my research in real life at a conference or research group/department seminar 😱. We also miss out on a lot of free food and pens not going to any in life conferences, which is what research is about 😉. Also, I find it difficult to stay focused on the conference when it’s online.

Max – For me the structure of two minute summaries followed by breakout Zoom rooms for each speaker had some definite drawbacks. For topics outside one’s own field, I found it difficult to really learn much from many of the summaries – it’s not easy to fit something interesting for experts and non-experts into two minutes! In theory you can go speak to presenters in their breakout rooms, but there’s something awkward about entering a zoom breakout room with just you and the presenter, particularly when you aren’t sure exactly how well you understood their two minute summary.

In light of your vEGU21 experience, what are your thoughts on remote vs traditional conferencing?

Max – Overall I think virtual conferencing has a way to go before it can match up to the in person experience. There were the classic technical issues of anything hosted remotely: the ‘I think you’re on mute’ experience, other microphone issues, and even the conference website crashing on the first day of scientific sessions (though the organisers did a swift job getting the conference back up and running). But there’s also the less obvious, such as it feeling actually quite a lonely experience. I’ve only been to a couple of in-person conferences, but there were always some people I knew and could meet up with. But it’s challenging to recreate this online, especially for early career researchers who don’t have as many established connections, and particularly at a big conference like the EGU general assembly. Perhaps a big social media presence can somewhat replace this, but not everyone (including myself!) is a big social media user. .

On the other hand, it’s great that we can still have conferences during a global pandemic, and no doubt is better than an absence of them entirely. Above all else, it’s also much greener and more accessible to those with less available funding for conference travel (though new challenges of accessibility, such as internet quality and access, undoubtedly arise). Plus, the facility to upload various display materials and people to look back at them whenever they like, regardless of time zones, is handy.

Chloe – I’d just add, as great as Twitter is and can be for promoting your research, it’s not the same as going for a good old cup of tea (or cocktail) with someone. Also, you can have the biggest brightest social media, but actually be terrible at conveying your research in person.

Summary

Overall it was interesting to take part in vEGU21, and we were both glad we went. It didn’t quite live up to the in person experience – and there is definitely room for improvements for virtual conferencing – but it’s great we can still have these experiences, albeit online.

Coding lessons for the newly initiated

April 30, 2021April 30, 2021Daniel AyersLeave a comment

Better coding skills and tooling enable faster, more useful results.

Daniel Ayers – d.ayers@pgr.reading.ac.uk

This post presents a collection of resources and tips that have been most useful to me in the first 18 months I’ve been coding – when I arrived at Reading, my coding ability amounted to using excel formulas. These days, I spend a lot of time coding experiments that test how well machine learning algorithms can provide information on error growth in low-dimensional dynamical systems. This requires fairly heavy use of Scikit-learn, Tensorflow and Pandas. This post would have been optimally useful at the start of the year, but perhaps even the coding veterans will find something of use – or better, they can tell me about something I am yet to discover!

First Steps: a few useful references

A byte of python. A useful and concise reference for the fundamentals.
Python Crash Course, Eric Matthes (2019). Detailed, lots of examples, and covers a wider range of topics (including, for example, using git). There are many intro to Python books around; this one has certainly been useful to me.¹ There are many good online resources for python, but it can be helpful initially to have a coherent guide in one place.

How did I do that last time…?

Tip: save snippets.

There are often small bits of code that contain key tricks that we use only occasionally. Sometimes it takes a bit of time reading forums or documentation to figure out these tricks. It’s a pain to have to do the legwork again to find the trick a second or third time. There were numerous occasions when I knew I’d worked out how to do something previously, and then spent precious minutes trawling through various bits of code and coursework to find the line where I’d done it. Then I found a better solution: I started saving snippets with an online note taking tool called Supernotes. Here’s an example:

I often find myself searching through my code snippets to remind myself of things.

Text editors, IDEs and plugins.

If you haven’t already, it might be worth trying some different options when it comes to your text editor or IDE. I’ve met many people who swear by PyCharm. Personally, I’ve been getting on well with Visual Studio Code (VS Code) for a year now.

Either way, I also recommend spending some time installing useful plugins as these can make your life easier. My recommendations for VS Code plugins are: Hungry Delete, Rainbow CSV, LaTeX Workshop, Bracket Pair Colorizer 2, Rewrap and Todo Tree.

Linters & formatters

Linters and formatters check your code for syntax errors or style errors. I use the Black formatter, and have it set to run every time I save my file. This seems to save a lot of time, and not only with formatting: it becomes more obvious when I have used incorrect syntax or made a typo. It also makes my code easier to read and look nicer. Here’s an example of Black in anger:

Some other options for linters and formatters include autopep, yapf and pylint.

Metadata for results

Data needs metadata in order to be understood. Does your workflow enable you to understand your data? I tend to work with toy models, so my current approach is to make a new directory for each version of my experiment code. This way I can make notes for each version of the experiment (usually in a markdown file). In other words, what not to do, is to run the code to generate results and then edit the code (excepting, of course, if your code has a bug). At a later stage you may want to understand how your results were calculated, and this cannot be done if you’ve changed the code file since the data was generated (unless you are a git wizard).

A bigger toolbox makes you a more powerful coder

Knowing about the right tool for the job can make life much easier.² There are many excellent Python packages, and the more you explore, the more likely you’ll know of something that can help you. A good resource for the modules of the Python 3 standard library is Python Module of The Week. Some favourite packages of mine are Pandas (for processing data) and Seaborn (a wrapper on Matplotlib that enables quick and fancy plotting of data). Both are well worth the time spent learning to use them.

Some thoughts on Matplotlib

Frankly some of the most frustrating experiences in my early days with python was trying to plot things with Matplotlib. At times it seemed inanely tedious, and bizarrely difficult to achieve what I wanted given how capable a tool others made it seem. My tips for the uninitiated would be:

Be a minimalist, never a perfectionist. I often managed to spend 80% of my time plotting trying to achieve one obscure change. Ask: Do I really need this bit of the plot to get my point across?
Can you hack it, i.e. can you fix up the plot using something other than Matplotlib? For example, you might spend ages trying to tell Matplotlib to get some spacing right, when for your current purpose you could get the same result by editing the plot in word/pages in a few clicks.
Be patient. I promise, it gets easier with time.

Object oriented programming

I’m curious to know how many of us in the meteorology department code with classes. In simple projects, it is possible to do without classes. That said, there’s a reason classes are a fundamental of modern programming: they enable more elegant and effective problem solving, code structure and testing. As Hans Petter Langtangen states in A Primer on Scientific Programming with Python, “classes often provide better solutions to programming problems.”

What’s more, if you understand classes and object- oriented programming concepts then understanding others’ code is much easier. For example, it can make Matplotlib’s documentation easier to understand and, in the worse caseworst case scenario, if you had to read the Matplotlib source code to understand what was going on under the hood, it will make much more sense if you know how classes work. As with Pandas, classes are worth the time buy in!

Have any suggestions or other useful resources for wannabe pythonistas? Please comment below or email me at d.ayers@pgr.reading.ac.uk.

Support vector machine for classification of space weather

April 23, 2021April 23, 2021Carl HainesLeave a comment

Carl Haines – carl.haines@pgr.reading.ac.uk

In a recent blog post, I discussed the use of the analogue ensemble (AnEn), or “similar-day” approach to forecasting geomagnetic activity. In this post I will look at the use of support vector machines (SVMs), a machine learning approach, to the same problem and compare the performance of the SVM to the AnEn. An implementation of the SVM has been developed for this project in python and is available at https://doi.org/10.5281/zenodo.4604485.

Space weather encompasses a range of impacts on Earth caused by changes in the near-earth plasma due to variable activity at the Sun. These impacts include damage to satellites, interruptions to radio communications and damage to power grids. For this reason, it is useful to forecast the occurrence, intensity and duration of heightened geomagnetic activity, which we call geomagnetic storms.

As in the previous post, the measure of geomagnetic activity used is the aa_Hindex which has been developed by Mike Lockwood in Reading. The aa_H index gives a global measure of geomagnetic activity at a 3-hour resolution. In this index, the minimum value is zero and larger values represent more intense geomagnetic activity.

The SVM is a commonly used classification algorithm which we implement here to classify whether a storm will occur. Given a sample of the input and the associated classification labels, the SVM will find a function that separates these input features by their class label. This is simple if the classes are linearly separable, as the function is a hyperplane. The samples lying closest to the hyperplane are called support vectors and the distance between these samples and the hyperplane is maximised.

*Figure 1 – A diagram explaining the kernel trick used by SVMs. This figure has been adopted from (Towards data science, n.d.)*

Typically, the samples are not linearly separable, so we employ Cover’s theorem which states that linearly inseparable classification problems are more likely to be linearly separable when cast non-linearly into a higher dimensional space. Therefore, we use a kenel trick to throw the inputs into a higher dimensional feature space, as depicted in Figure 1, to make it more separable. Further explanation is available in (Towards data science, n.d.)

Based on aa_H values in the 24-hour training window, the SVM predicts whether the next 3 hours will be either a storm or not. By comparing this dichotomous hindcast with the observed aa_H, the outcome will be one of True Positive (TP, where a storm is correctly predicted), True Negative (TN, where no storm is correctly predicted), False Positive (FP, where a storm is predicted but not observed), or False Negative (FN, where a storm is not predicted but is observed). This is shown in the form of a contingency table in Figure 2.

*Figure 2 – Contingency table for the SVM classifying geomagnetic activity into the “no-storm” and “storm” classes. THe results have been normalised across the true label.*

For development of the SVM, the aa_H data has been separated into independent training and test intervals. These intervals are chosen to be alternate years. This is longer than the auto-correlation in the data (choosing, e.g. alternate 3-hourly data points, would not generate independent training and test data sets) but short enough that we assume there will not be significant aliasing with solar cycle variations.

Training is an iterative process, whereby a cost function is minimised. The cost function is a combination of the relative proportion of TP, TN, FP and FN. Thus, while training itself, an SVM attempts to classify labelled data i.e., data belonging to a known category, in this case “storm” and “no storm” on the basis of the previous 24 hours of aa_H. If the SVM makes an incorrect prediction it is penalised through a cost function which the SVM minimises. The cost parameter determines the degree to which the SVM is penalised for a misclassification in training which allows for noise in the data.

It is common that data with a class imbalance, that is containing many more samples from one class than the other, causes the classifier to be biased towards the majority class. In this case, there are far more non-storm intervals than storm intervals. Following (McGranaghan, Mannucci, Wilson, Mattman, & Chadwick, 2018), we define the cost of mis-classifying each class separately. This is done through the weight ratio W_storm : W_no_storm. Increasing the W_storm increases the frequency at which the SVM predicts a storm and it follows that it predicts no storm at a reduced frequency. In this work we have varied W_storm and kept W_no_storm constant at 1. A user of the SVM method for forecasting may wish to tune the class weight ratio to give an appropriate ratio of false alarms and hit rate dependent on their needs.

Different forecast applications will have different tolerances for false alarms and missed events. To accommodate this, and as a further comparison of the hindcasts, we use a Cost/Loss analysis. In short, C is the economic cost associated with taking mitigating action when an event is predicted (whether or not it actually occurs) and L is the economic loss suffered due to damage if no mitigating action is taken when needed. For a deterministic method, such as the SVM, each time a storm is predicted will incur a cost C. Each time a storm is not predicted but a storm occurs a loss L is incurred. If no storm is predicted and no storm occurs, then no expense in incurred. By considering some time interval, the total expense can be computed by summing C and L. Further information, including the formular for computing the potential economic value (PEV) can be found in (Owens & Riley, 2017).

A particular forecast application will have a C/L ratio in the domain (0,1). This is because a C/L of 0 would mean it is most cost effective to take constant mitigating action and a C/L of 1 or more means that mitigating action is never cost effective. In either case, no forecast would be helpful. The power of a Cost/Loss analysis is that it allows us to evaluate our methods for the entire range of potential forecast end users without specific knowledge of the forecast application requirements. End users can then easily interpret whether our methods fit their situation.

*Figure 3 – A cost loss analysis comparing the potential economic value of a range of SVMs to the AnEn.*

Figure 3 shows the potential economic value (PEV) of the SVM with a range of class weights (CW), probabilistic AnEn and 27-day recurrence. The shaded regions indicate which hindcast has the highest PEV for that Cost/Loss ratio. The probabilistic AnEn has the highest PEV for the majority of the Cost/Loss domain although SVM has higher PEV for lower Cost/Loss ratios. It highlights that the `best’ hindcast is dependent on the context in which it is to be employed.

In summary, we have implemented an SVM for the classification of geomagnetic storms and compared the performance to that of the AnEn which was discussed in a previous blog post. The SVM and AnEn generally perform similarly in a Cost/Loss analysis and the best method will depend on the requirements of the end user. The code for the SVM is available at https://doi.org/10.5281/zenodo.4604485 and the AnEn at https://doi.org/10.5281/zenodo.4604487.

References

Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data mining an knowledge discovery.

McGranaghan, R., Mannucci, A., Wilson, B., Mattman, C., & Chadwick, R. (2018). New Capabilities for Prediction of High-Latitude Ionospheric Scin-669tillation: A Novel Approach With Machine Learning. Space Weather.

Owens, M., & Riley, P. (2017). Probabilistic Solar Wind Forecasting Using Large Ensembles of Near-Sun Conditions With a Simple One-Dimensional “Upwind” Scheme. Space weather.

Towards data science. (n.d.). Retrieved from https://towardsdatascience.com/the-kernel-trick-c98cdbcaeb3f 

Share this:

References

Share this:

Share this:

Share this:

First attempt…

Pandas

Big datasets

Share this:

Share this:

Share this:

Share this: