Daniel Ayers

Daniel Ayers – d.ayers@pgr.reading.ac.uk

The Met Office Climate Data Challenge 2022 was a two day virtual hackathon-style event where participants hacked solutions to challenges set by Aon (Wikipedia: “a British-American multinational professional services firm that sells a range of financial risk-mitigation products, including insurance, pension administration, and health-insurance plans”) and the Ministry of Justice (MoJ). Participants heralded from the Met Office and the universities of Reading, Bristol, Oxford, Exeter, Leeds and UCL. Here’s how I found the experience and what I got out of it.

If your PhD experience is anything like mine, you feel pretty busy. In particular, there are multitudinous ways one can engage in not-directly-your-research activities, such as being part of the panto or other social groups, going to seminars, organising seminars, going to conferences, etc. Obviously these can all make a positive contribution to your experience – and seminars are often very useful – but my point is: it can sometimes feel like there are too few periods of uninterrupted time to focus deeply on actually doing your research.

**Fig. 1:** There are many ways to be distracted from actually doing your research.

So: was it worth investing two precious days into a hackathon? Definitely. The tl;dr is: I got to work with interesting people, I got an experience of working on a commercial style project (very short deadline for the entire process from raw data to delivered product), and I got an insight into the reinsurance industry. I’ll expand on these points in a bit.

Before the main event, the four available challenges were sent out a few weeks in advance. There was a 2hr pre-event meeting the week beforehand. In this pre-meeting, the challenges were formally introduced by representatives from Aon and MoJ, and all the participants split into groups to a) discuss ideas for challenge solutions and b) form teams for the main event. It really would have helped to have done a little bit of individual brainstorming and useful-material reading before this meeting.

As it happened, I didn’t prepare any further than reading through the challenges, but this was useful. I had time to think about what I thought I could bring to each challenge, and vaguely what might be involved in solutions to each challenge. I concluded that the most appropriate challenge for me was an Aon challenge about determining how much climate change was likely to impact insurance companies through changes to the things insurance companies insure (as opposed to, for example, the frequency or intensity of extreme weather events which might cause payouts to be required). In the pre-meeting, someone else presented an idea that lined up with what I wanted to do: model some change in earth and human systems and use this to create new exposure data sets (for exposure data set, read “list of things the insurance companies insure for, and how much a full payout will cost”). This was a lofty ambition, as I will explain. Regardless, I signed up to this team and I was all set for the main two-day event.

Here are some examples of plots that helped us to understand the exposure data set. We were told, for example, that for some countries, a token lat-lon coordinate was used for all entries in that country. This resulted in some lat-lon coords being used with comparatively high frequency, despite the entries potentially describing large or distinct areas of land.

The next two plots show the breakdown of the entries by country, and then by construction type. Each entry is for a particular set of buildings. When modelling the likely payout following an event (e.g. a large storm) it is useful to know how the buildings are made.

One thing I want to mention, in case the reader is involved with creating a hackathon at any point, is the importance of challenge preparation. The key thing is that participants need to be able to hit the ground running in the event itself. Two things are key to this being possible.

First, the challenge material should ideally provide a really good description of the problem space. In our case, we spent half of the first day in a meeting with the (very helpful) people from Aon, picking their brains about how the reinsurance industry worked, what they really cared about, what would count as an answer to this question, what was in the mysterious data set we had been given and how should the data be interpreted. Yes, this was a great opportunity to learn and have a discussion with someone I would ordinarily never meet, but my team could have spent more precious hackathon hours making a solution if the challenge material had done a better job of explaining what was going on.

Second, any resources that are provided (in our case, a big exposure data set – see above), need to be ready to use. In our case, only one person in some other team had been sent the data set, it wasn’t available before the main event started, there was no metadata, and once I managed to get hold of it I had to spend 2-3 hours working out which encoding to use and how to deal with poorly-separated lines in the .csv file. So, to all you hackathon organisers out there: test the resources you provide, and check they can be used quickly and easily.

By the end of the second day, we’d not really got our envisioned product working. I’d managed to get the data open at last, and done some data exploration plots, so at least we had a better idea of what we were playing with. My team mates had found some really useful data for population change, and for determining if a location in our data set was urban or rural. They had also set up a slack group so that we could collaborate and discuss the different aspects of the problem, and a GitHub repo so we could share our progress (we coded everything in Python, mainly using Jupyter notebooks). We’d also done a fair amount of talking with the experts from Aon, and amongst ourselves as a team, to work out what was viable. This was a key experience from the event: coming up with a minimal viable product. The lesson from this experience was: be ok with cutting a lot of big corners. This is particularly useful for me as a PhD student, where it can be tempting to think I have time to go really deep into optimising and learning about everything required. My hackathon experience showed how much can be achieved even when the time frame forces most corners to be cut.

To give an example of cutting corners, think about how many processes in the human-earth system might have an effect over the next 30 years on what things there are to insure, where they are, and how much they cost. Population increase, urbanisation and ruralisation, displacement from areas of rising water levels or increased flooding risk, construction materials being more expensive in order to be more environmentally friendly, immigration, etc. Now, how many of these could we account for in a simplistic model that we wanted to build in two days? Answer: not many! Given we spent the first day understanding the problem and the data, we only really had one day, or 09:45 – 15:30, so 5 hours and 45 minutes, to build our solution. We attempted to account for differences in population growth by country, by shared socio-economic pathway, and by a parameterised rural-urban movement. As I said, we didn’t get the code working by the deadline, and ended up presenting our vision, rather than a demonstration of our finished solution.

There might be an opportunity to do more work on this project. A few of the projects from previous years’ hackathons have resulted in publications, and we are meeting shortly to see whether there is the appetite to do the same with what we’ve done. It would certainly be nice to create a more polished piece of work. That said, preserving space for my own research is also important!

As a final word on the hackathon: it was great fun, and I really enjoyed working with my team. PhD work can be a little isolated at times, so the opportunity to work with others was enjoyable and motivating. Hopefully, next time it will be in person. I would recommend others to get involved in future Met Office Climate Data Challenges!

Better coding skills and tooling enable faster, more useful results.

Daniel Ayers – d.ayers@pgr.reading.ac.uk

This post presents a collection of resources and tips that have been most useful to me in the first 18 months I’ve been coding – when I arrived at Reading, my coding ability amounted to using excel formulas. These days, I spend a lot of time coding experiments that test how well machine learning algorithms can provide information on error growth in low-dimensional dynamical systems. This requires fairly heavy use of Scikit-learn, Tensorflow and Pandas. This post would have been optimally useful at the start of the year, but perhaps even the coding veterans will find something of use – or better, they can tell me about something I am yet to discover!

First Steps: a few useful references

A byte of python. A useful and concise reference for the fundamentals.
Python Crash Course, Eric Matthes (2019). Detailed, lots of examples, and covers a wider range of topics (including, for example, using git). There are many intro to Python books around; this one has certainly been useful to me.¹ There are many good online resources for python, but it can be helpful initially to have a coherent guide in one place.

How did I do that last time…?

Tip: save snippets.

There are often small bits of code that contain key tricks that we use only occasionally. Sometimes it takes a bit of time reading forums or documentation to figure out these tricks. It’s a pain to have to do the legwork again to find the trick a second or third time. There were numerous occasions when I knew I’d worked out how to do something previously, and then spent precious minutes trawling through various bits of code and coursework to find the line where I’d done it. Then I found a better solution: I started saving snippets with an online note taking tool called Supernotes. Here’s an example:

I often find myself searching through my code snippets to remind myself of things.

Text editors, IDEs and plugins.

If you haven’t already, it might be worth trying some different options when it comes to your text editor or IDE. I’ve met many people who swear by PyCharm. Personally, I’ve been getting on well with Visual Studio Code (VS Code) for a year now.

Either way, I also recommend spending some time installing useful plugins as these can make your life easier. My recommendations for VS Code plugins are: Hungry Delete, Rainbow CSV, LaTeX Workshop, Bracket Pair Colorizer 2, Rewrap and Todo Tree.

Linters & formatters

Linters and formatters check your code for syntax errors or style errors. I use the Black formatter, and have it set to run every time I save my file. This seems to save a lot of time, and not only with formatting: it becomes more obvious when I have used incorrect syntax or made a typo. It also makes my code easier to read and look nicer. Here’s an example of Black in anger:

Some other options for linters and formatters include autopep, yapf and pylint.

Metadata for results

Data needs metadata in order to be understood. Does your workflow enable you to understand your data? I tend to work with toy models, so my current approach is to make a new directory for each version of my experiment code. This way I can make notes for each version of the experiment (usually in a markdown file). In other words, what not to do, is to run the code to generate results and then edit the code (excepting, of course, if your code has a bug). At a later stage you may want to understand how your results were calculated, and this cannot be done if you’ve changed the code file since the data was generated (unless you are a git wizard).

A bigger toolbox makes you a more powerful coder

Knowing about the right tool for the job can make life much easier.² There are many excellent Python packages, and the more you explore, the more likely you’ll know of something that can help you. A good resource for the modules of the Python 3 standard library is Python Module of The Week. Some favourite packages of mine are Pandas (for processing data) and Seaborn (a wrapper on Matplotlib that enables quick and fancy plotting of data). Both are well worth the time spent learning to use them.

Some thoughts on Matplotlib

Frankly some of the most frustrating experiences in my early days with python was trying to plot things with Matplotlib. At times it seemed inanely tedious, and bizarrely difficult to achieve what I wanted given how capable a tool others made it seem. My tips for the uninitiated would be:

Be a minimalist, never a perfectionist. I often managed to spend 80% of my time plotting trying to achieve one obscure change. Ask: Do I really need this bit of the plot to get my point across?
Can you hack it, i.e. can you fix up the plot using something other than Matplotlib? For example, you might spend ages trying to tell Matplotlib to get some spacing right, when for your current purpose you could get the same result by editing the plot in word/pages in a few clicks.
Be patient. I promise, it gets easier with time.

Object oriented programming

I’m curious to know how many of us in the meteorology department code with classes. In simple projects, it is possible to do without classes. That said, there’s a reason classes are a fundamental of modern programming: they enable more elegant and effective problem solving, code structure and testing. As Hans Petter Langtangen states in A Primer on Scientific Programming with Python, “classes often provide better solutions to programming problems.”

What’s more, if you understand classes and object- oriented programming concepts then understanding others’ code is much easier. For example, it can make Matplotlib’s documentation easier to understand and, in the worse caseworst case scenario, if you had to read the Matplotlib source code to understand what was going on under the hood, it will make much more sense if you know how classes work. As with Pandas, classes are worth the time buy in!

Have any suggestions or other useful resources for wannabe pythonistas? Please comment below or email me at d.ayers@pgr.reading.ac.uk.

The Social Metwork

PhD Student Blog from the Department of Meteorology, University of Reading

Author: Daniel Ayers

Met Office Climate Data Challenge 2022

Coding lessons for the newly initiated

Share this:

Share this: