Starting Your PhD Journey: Tips for Success

So, you’ve officially embarked on the exciting journey that is a PhD—congrats! You’ve reached a major milestone, and whether you’re feeling excited, overwhelmed, or a mix of both, just know you’ve signed up for an adventure like no other. A PhD is an incredible opportunity to dive headfirst into a subject you’re passionate about, build a toolkit of valuable skills, and—who knows?—maybe even make history in your field.

But let’s be real: it’s not all rainbows and groundbreaking discoveries. The PhD life can be challenging, sometimes feeling like a marathon through an obstacle course. You’ll have moments that test your patience, confidence, and sometimes, your sanity. That’s why here at Social Metwork, we’ve gathered some golden advice from seasoned PhD students to help you navigate these waters. Our goal? To make this transition into PhD life a little smoother, maybe even a little fun.

We’ll break these tips down into three areas: navigating day-to-day life as a PhD student, getting organized like a pro, and growing into the great scholar you’re destined to be. Ready? Let’s dive in!

1. Navigating Day-to-day Life as a PhD Student

Work-life balance

The first year of your PhD can feel overwhelming as you try to juggle research, coursework, and life. One key piece of advice? Don’t overwork yourself. As Laura Risley puts it, “Sometimes if you’re struggling with work, an afternoon off is more useful than staying up late and not taking a break.” It’s easy to get absorbed in your work, but stepping away to recharge can actually help you return with fresh perspectives.

Getting involved in activities outside your PhD is another great way to maintain balance (L. Risley, 2024). Whether it’s exploring more of Reading, participating in a hobby, or just getting outside for some fresh air, your brain will thank you for the break. Remember, “Your PhD is important, but so is your health,” so make sure to take care of yourself and make time for things that bring you joy: exercise, good food, and sleep!

Lastly, don’t underestimate the power of routine. Building a consistent schedule can help bring some stability to PhD life. Most importantly, be kind to yourself. The weight of expectations can be heavy so give yourself permission to not have it all figured out yet. You won’t understand everything right away, and that’s completely normal!

Socialising and Building a Support System

Your cohort is your lifeline. The people you start with are going through the same experiences, and they will be your greatest support system. Whether you’re attending department events, organizing a BBQ, or just grabbing a coffee, socializing with your peers is a great way to get through everything. At the end of the day, we are all in this together! As Rhiannon Biddiscombe wisely says, “Go for coffee with people, go to Sappo, enjoy the pub crawls, waste a night out at PT, take part in the panto, spend time in the department in-person” — so make sure you get involved!

If what you want is to meet new people, you could even help organise social events, like research groups or casual hangouts – feeling connected within your department can make all the difference when you’re having a tough week. And hey, if you’re looking for a fun group activity, “Market House in town has darts boards, ping pong tables, and shuffleboard (you slide little discs to the end of the board, it’s good fun!)”.  

2. Getting Organised Like a Pro

Writing and Coding

Staying organised is critical for both your mental health and your research. Adam Gainford recommends you start by setting up a reference manager early on—trust us, you’ll thank yourself later. And if your research involves coding, learn version control tools like GitHub to keep your projects neat and manageable. As a fellow PhD student says “Keeping organised will help keep your future self sane (and it’s a good skill that will help you with employability and future group projects)”.

A golden rule for writing: write as you go. Don’t wait until the last minute to start putting your thoughts on paper. Whether it’s jotting down a few ideas, outlining a chapter, or even starting a draft, regular writing will save you from stress later on. Remember what Laura always says, “It’s never too early to start writing.”

Time Management

Managing your time as a PhD student is a balancing act. Plans will shift, deadlines will change, and real life will get in the way—it’s all part of the process. Instead of stressing over every slipped deadline, try to “go with the flow”. Your real deadlines are far down the road, and as long as you’re progressing steadily, you’re doing fine.

Being organised also doesn’t have to be complicated. Some find it helpful to create daily, weekly, or even monthly plans. Rhiannon recommends keeping a calendar is a great way to track meetings, seminars, and research group sessions – I myself could not agree more and find time-blocking is a great way to make sure everything gets done. Regarding your inbox, make sure you “stay on top of your emails but don’t look at them constantly. Set aside a few minutes a day to look at emails and sort them into folders, but don’t let them interrupt your work too much!”. Most importantly though, don’t forget to schedule breaks—even just five minutes of stepping away can help you reset (and of course, make sure you have some valuable holiday time off!).

3. Growing into the Scholar You’re Meant to Be

Asking for Help

This journey isn’t something you’re expected to do alone. Don’t be afraid to reach out for help from your friends, supervisors, or other PhD students. Asking questions is a sign of strength, not weakness. What’s great is that everyone has different backgrounds, and more often than not, someone will be able to help you navigate whatever you’re facing (trust me, as a geography graduate my office mates saved my life with atmospheric physics!). Whether you’re stuck on a tricky equation or need clarification on a concept, ask ask ask! 

“You’ve got a whole year to milk the ‘I’m a first year’ excuse, but in all seriousness, its never too late to ask when you’re unsure!” – a fellow PhD student.

Navigating Supervisor Meetings

Your supervisors are there to guide you, but communication is key. Be honest with them, especially when you’re struggling or need more support. If something doesn’t make sense, speak up—don’t nod along and hope for the best, “they should always have your back” (it will also be very embarrassing if you go along with it and are caught out with questions…). 

Also, “If you know some things you want to get out of your PhD, communicate that with your supervisors”. Open communication will help you build a stronger working relationship and ensure you get what you need from the process.

Dealing with Imposter Syndrome

Imposter syndrome can hit hard during a PhD, especially when you’re surrounded by brilliant people doing impressive work. But here’s the thing: don’t compare yourself to others. Everyone’s PhD is different—some projects lend themselves to quick results, while others take longer. Just because someone publishes early doesn’t mean your research is less valuable or that you’re behind – we are all on our own journeys. 

And remember, no one expects you to know everything right away. “There might be a pressure, knowing that you’ve been ‘handpicked’ for a project, that you should know things already; be able to learn things more quickly than you’re managing; be able to immediately understand what your supervisor is talking about when they bring up XYZ concept that they’ve been working on for 20+ years. In reality, no reasonable person expects you to know everything or even much at all yet. You were hand-picked for the project because of your potential to eventually become an independent researcher in your field – A PhD is simply training you for that, so you need to finish the PhD to finish that training.”

If you’d struggling with imposter syndrome, or want to learn about ways to deal with it, I highly recommend attending the imposter syndrome RRDP. 

A Few Final Words of Wisdom

The PhD rollercoaster is full of ups and downs, but remember, you’re doing fine. “If you’re supervisors are happy, then don’t worry! Everything works out in the end, even when it seems to not be working for a while! “– Laura Risley

It’s also super important to enjoy the process. You’ve chosen a topic you’re passionate about, and this is a rare opportunity to fully immerse yourself in it. Take advantage of that! Don’t shy away from opportunities to share your work. Whether it’s giving a talk, presenting a poster (or writing for the Social Metwork blog!!), practice makes perfect when it comes to communicating your research.

Embarking on a PhD is no small feat, but hopefully with these tips, you’ll have the tools to manage the challenges and enjoy the ride. And if all else fails, remember the most important advice of all: “Vote in the Big Biscuit Bracket—it’s the most important part of being a PhD student!”. 

From the department’s PhDs students to you! 

Written by Juan Garcia Valencia 

Designing a program to improve data access for my PhD project

Caleb Miller – c.s.miller@pgr.reading.ac.uk

In my project work, I regularly need to load hundreds of various CSV (comma separated values) files with daily data from meteorological observations. For example, many of the measurements I use are made at the Reading University Atmospheric Observatory using the main datalogger, in addition to some of my own instruments. This data comes distributed across a number of different files for each day.

Most of my analysis is done in Python using the Pandas library for data processing. Pandas can easily read in CSV files with a built-in function, and it is well-suited for the two-dimensional data tables which I regularly use.

However, after a year or so of working directly with CSV files, I began to run up against some of the performance limitations of doing so.

Daily CSV files may be good for organizational purposes, but they are not the most efficient way to store and access large amounts of data. In particular, once I wanted to start studying many years’ worth of data at the same time, reading in each file every time I wanted to re-run my code began to slow the development process significantly. Also, the code that I had written to locate each file and read it in for a given range of dates was somewhat clunky and inflexible.

It was time to develop a new solution for accessing the met observations more quickly and easily.

Pandas has built-in functions for reading a variety of different formats, not just plain-text CSV files. I decided to constrain my choices for data formats to those that Pandas could read natively.

In addition, I wanted to build a system that would satisfy three primary goals:

  • Compatibility for long-term data storage
  • High speed
  • Simple programming interface

Compatibility is important, since I wanted to ensure that my data would continue to be readable to others (and myself in the future) without any specialized software that I had written. CSV is excellent for that purpose.

However, CSV was not a fast way to access the data. Ideally, the system I chose could store both numerical data and timestamps as floating point values rather than encoded text, for better performance.

Finally, I wanted to create a system that would be flexible and easy to use–ideally, something that would only require one or two lines of code to load in the data from a given instrument and date range, rather than the many complicated steps that had been required to search for and load the many files I had been using.

System Design

In the end, I settled on a rather complicated system that resulted in a very simple, reliable, and fast data stack that could be used to access my data.

At the base layer, all the data would be stored in the original CSV files. This is the format that most of the data comes in, and the few instruments that do not can easily be converted. CSV is a very common file format, which can be read easily by many software packages and will likely be useful far into the future, even when current software is too outdated to be run.

However, rather than directly accessing the CSV files, I import them occasionally into a SQLite database file. SQLite is a widely-used, open source software library which enables users to run a database from a single file, rather than a server (as opposed to many other popular database programs). The advantage over CSV files is that it is relatively fast. Data from an individual table can be accessed by a query specifying the start and end dates. This means that it is very easy to load in arbitrary timeseries of data.

However, for loading many years’ worth of high-resolution data, even SQLITE was not as fast as I wanted. Pandas is also capable of using a format called pickle. “Pickling” a dataframe outputs the dataframe from program memory to the disk as a file. This can be then be read back very quickly into a program at a later time, even for large files.

In my data access library, once a request is made for a given timeseries of data, that dataframe is cached to a pickle file. If the same request is made again shortly afterwards, rather than going back to the SQLite database, the data is loaded from the pickle file. For large datasets, this can reduce the loading time from nearly a minute to just a few seconds, which is very helpful when repeatedly debugging a program using the same data! The cache files can be relatively large, however, so they are automatically cleared out when the code runs if they have not been used for several days.

Finally, all of this functionality is available behind a simple library, which allows for accessing a dataset from any other Python code on my machine with just two lines, as shown below.

import foglib.db as fdb
fdb.load_db("dataset_name","start_datetime","end_datetime")

Conclusions

I have found this system to work very well for my purposes. It required a fair amount of development work, but the returns have been very beneficial.

By allowing me to access almost any of my data with just a few lines of code, I can now start new analyses with less time and code overhead. This means that I have more time and energy to spend answering science questions. And because it allows reading in large datasets so quickly, this means that I can rapidly debug my code without requiring me to wait as long while my code runs.

My particular solution may not be the ideal data-loading system for everyone’s needs. However, based on my experiences working on this program, I believe that time invested in enabling access to your data at the beginning of a PhD is time very well spent.

MeteoXchange 

Supporting International Collaboration for Early Career Researchers 

James Fallon – j.fallon@pgr.reading.ac.uk 
 

What is it? 

Due to lockdowns and travel restrictions since 2020, networking opportunities in science have been transformed. We can expect to see a mix of virtual and hybrid elements persist into the future, offering both cost-saving and carbon-saving benefits. 

The MeteoXchange project aims to become a new platform for young atmospheric scientists from all over the world, providing networking opportunities and platforms for collaboration. The project is an initiative of German Federal Ministry of Education and Research, and research society Deutsche Forschungsgesellschaft. Events are conducted in English, and open to young scientists anywhere. 

ECS Conference 

This year marked the first ever MeteoXchange conference, which took place online in March 2022. The ECS (early career scientists) conference took place over two days, on gather.town. An optional pre-conference event gave the opportunity for new presenters to work on presentation skills and receive feedback at the end of the main conference. 

Figure 1: Conference Schedule, including a keynote on Machine Learning and Earth System Modelling, movie night, and presenter sessions. 

Five presenter sessions were split over two days, with young scientists sharing their research to a conference hall on the virtual platform gather.town. Topics ranged from lidar sensing and reanalysis datasets, to cloud micro-physics and UV radiation health impacts. I really enjoyed talks on the attribution of ‘fire weather’ to climate change, and machine learning techniques for thunderstorm forecasting! The first evening concluded with a screening of documentary Picture a Scientist

During the poster session on the second day, I presented my research poster to different scientists walking by my virtual poster board. Posters were designed to mimic the large A2 printouts seen at in-person events. Two posters that really stood out were a quantification of SO2 emissions from Kilauea volcano in Hawaii, and an evaluation of air quality in Cuba’s Mariel Bay using meteorological diagnostic models combined with air dispersion modelling. 

Anticipating that it might be hard to communicate on the day, I added a lot of text to my poster. However, I needn’t have worried as the virtual platform worked flawlessly for conducting poster Q&A – the next time I present on a similar platform I will try to avoid using as much text and instead focus on a more traditional layout! 

Figure 2: During the poster session, I presented my research on Reserve-Power systems – energy-volume requirements and surplus capacity set by weather events. 

By the conference end, I got the impression that everyone had really enjoyed the event! Awards were given for the winners of the best posters and talks. The ECS conference was fantastically well organised by Carola Detring (DWD) and Philipp Joppe (JGU Mainz), and a wonderful opportunity to meet researchers from around the world. 

MeteoMeets 

Since July 2021, MeteoXchange have held monthly meetups, predominantly featuring lecturers and professors who introduce research at their institute for early career scientists in search of opportunities! 

The opportunities shared at MeteoMeets are complemented by joblists and by the MeteoMap: https://www.meteoxchange.de/meteomap. The MeteoMap lists PhD and postdoc positions across Germany, neatly displayed with different markers depending on the type of institute. This resource is currently still under construction. 

Figure 3: The MeteoMap features research opportunities in Germany, available for early career researchers from across the world. 

Travel Grants 

One of the most exciting aspects of the MeteoXchange project is the opportunity for international collaboration with travel grants! 

The travel funds offered by MeteoXchange are for two or more early career scientists in the field of atmospheric sciences. Students must propose a collaborative project, which aims to spark future work and networking between their own institutions. If the application is successful, students have the opportunity to access 2,500€ for travel funds.  

Over the last two weeks of April, I will be collaborating with KIT student Fabian Mockert  on “Dunkelflauten” (periods of low-renewable energy production, or “dark wind lulls”). Dunkelflauten, especially cold ones, result in high electricity load on national transmission networks, leading to high costs and potentially cause a failure of a fully renewable power system doi.org/10.1038/nclimate3338. We are collaborating to use power system modelling to better understand how this stress manifests itself. Fabian will spend two weeks visiting the University of Reading campus, meeting with students and researchers from across the department. 

Get Involved 

The 2022 travel grant deadline has already closed; however, it is hoped that MeteoXchange will receive funding to continue this project into future years, supporting young researchers in collaboration and idea-exchange. 

To get involved with the MeteoMeets, and stay up to date on MeteoXchange related opportunities, signup to the mailing list

How to write a PhD thesis during a global pandemic

Kaja Milczewska – k.m.milczewska@pgr.reading.ac.uk

Completing a PhD is a momentous task at the best of times, let alone in combination with a year-long global pandemic. Every PhD researcher is different, and as such, everyone has had different circumstantial struggles throughout Covid-19. The lack of human interaction that comes with working in a vibrant academic environment such as the Meteorology Department can make working from home a real struggle. Sometimes it is difficult to find the motivation to get anything useful done; whereas at other times you could squeeze five hours’ worth of work into one. Trying to stay organised is key to getting it done, therefore the following are some of the things that helped me get to the end of my PhD thesis – and it has not been easy. If you are still out there writing and finishing up experiments: read on! Maybe the result is that you might feel a little less alone. The PhD experience can be truly isolating at the best of times, so literally being instructed to isolate from the world is not ideal. The points are numbered for convenience of structuring this post, rather than any order of importance. 

  1. Communicate with your supervisor(s) 

It is tempting to “disappear off the radar” when things are not going well. You could wake up in the morning of the day of your regular weekly meeting, filled with dread that you have not prepared anything for it. Your brain recoils into the depths of your skull as your body recoils back under the safety of the duvet. What are your options? Some of them might be: take a deep gulp and force yourself out of bed with the prospect of coffee before the meeting (where you steer the conversation onto the things you did manage to do); or to postpone the meeting because you need to finish XYZ and thus a later meeting may be more productive; or ignore the meeting altogether. The first one is probably the best option, but it requires mental strength where there might be none. The second one is OK, but you still need to do the work. The last one is a big no. Don’t do it. 

Anxiety will make you believe that ignoring the world and all responsibilities is the most comfortable option in the moment, but the consequences of acting on it could be worse. Supervisors value honesty, and they know well that it is not always possible to complete all the scheduled tasks. Of course, if this happens every week then you might need to introspectively address the reasons for this, and – again, talking with your supervisor is usually a useful thing to do. You might not want them to know your entire life story, but it is helpful for everybody involved if they are aware that you struggle with anxiety / depression / ADHD / *insert any condition here*, which could affect your capacity to complete even the simplest, daily tasks. Being on the same page and having matching expectations is key to any student – supervisor partnership. 

  1.  Reward yourself for the things you have already accomplished. 

Whether that’s mid-week, mid-to-do-list, weekend — whenever. List all the things you have done regularly (either work- or life-related) and recognise that you are trying to survive a pandemic. And trying to complete the monstrous task of writing a PhD thesis. Those are big asks, and the only way to get through them is to break them down into smaller chunks. Putting down “Write thesis” on your to-do list is more likely to intimidate than motivate you. How about breaking it down further: “Re-create plot 4.21”, or “Consolidate supervisor comments on pages 21 – 25” — these are achievable things in a specified length of time. It also means you could tick them off more easily, hopefully resulting in feeling accomplished. Each time this happens, reward yourself in whatever way makes you feel nice. Even just giving yourself a literal pat on the shoulder could feel great – try it! 

  1. Compile supervisor feedback / comments into a spreadsheet  

An Excel spreadsheet – or any other suitable system – will enable you to keep track of what still needs addressing and what has been completed. The beauty of using a colour-coded spreadsheet for feedback comments is that once the required corrections are completed, you have concrete evidence of how much you have already achieved – something to consult if you start feeling inadequate at any point (see previous section!). I found this a much easier system than writing it down in my workbook, although of course this does work for some people, too. Anytime you receive feedback on your work – written or otherwise – note them down. I used brief reminders, such as “See supervisor’s comment on page X” but it was useful to have them all compiled together. Also, I found it useful to classify the comments into ‘writing-type’ corrections and ‘more work required’ corrections. The first one is self-explanatory: these were typos, wrong terminologies, mistakes in equations and minor structural changes. The ‘more work required’ was anything that required me to find citations / literature, major structural changes, issues with my scientific arguments or anything else that required more thought. This meant that if my motivation was lacking, I could turn to the “writing-type” comments and work on them without needing too much brain power. It also meant that I could prioritise the major comments first, which made working to a deadline a little bit easier. 

  1. Break down how long specific things will take 

This is most useful when you are a few weeks away from submission date. With only 5 weeks left, my colour-coded charts were full of outstanding comments; neither my ‘Conclusions’ chapter nor my Abstract had been written; plots needed re-plotting and I still did not know the title of my thesis. Naturally, I was panicking. I knew that the only way I could get through this was to set a schedule — and stick to it. At the time, there were 5 major things to do: complete a final version of each of my 5 thesis chapters. A natural split was to allow each chapter only one week for completion. If I was near to running over my self-prescribed deadline, I would prioritise only the major corrections. If still not done by the end of the allowed week: that’s it! Move on. This can be difficult for any perfectionists out there, but by this point the PhD has definitely taught me that “done” is better than perfect. I also found that some chapters took less time to finish than others, so I had time to return to the things I left not quite finished. Trust yourself, and give it your best. By all means, push through the hardest bit to the end, but remember that there (probably) does not exist a single PhD thesis without any mistakes. 

5. Follow useful Twitter threads 

There exist two groups of people: those who turn off or deactivate all social media when they need to focus on a deadline, and those who get even more absorbed by its ability to divert your attention away from the discomfort of the dreaded task at hand. Some might call it “productive procrastination”. I actually found that social media helped me a little – but only when my state of mind was such that I could resist the urge to fall down a scrolling rabbit hole. If you are on Twitter, you might find hashtags like #phdchat and accounts such as @AcademicChatter , @phdforum @phdvoice useful. 

6. Join a virtual “writing room” 

On the back of the last tip, I have found a virtual writing room helpful for focus. The idea is that you join an organised Zoom meeting full of other PhDs, all of whom are writing at the same time. All microphones are muted, but the chat is active so it is nice to say ‘hello!’ to someone else writing at the same time, anywhere else in the world. The meetings have scheduled breaks, with the organiser announcing when they occur. I found that because I actively chose to be up and start writing at the very early hour of 6am by attending the virtual writing room, I was not going to allow myself to procrastinate. The commitment to being up so early and being in a room full of people also doing the same thing (but virtually, obviously) meant that those were the times that I was probably the most focused. These kinds of rooms are often hosted by @PhDForum on Twitter; there could also be others. An alternative idea could be to set up a “writing meeting” with your group of peers and agree to keep chatter to a minimum (although this is not something I tried myself). 

7. Don’t look at the news 

Or at least, minimise your exposure to them. It is generally a good thing to stay on top of current events, but the final stages of writing a PhD thesis are probably unlike any other time in your life. You need the space and energy to think deeply about your own work right now. Unfortunately, I learnt this the hard way and found that there were days where I could do very little work because my brain was preoccupied with awful events happening around the world. It made me feel pathetic, routinely resulting in staying up late to try and finish whatever I failed to finish during the day. This only deteriorated my wellbeing further with shortened sleep and a constant sense of playing “catch-up”. If this sounds like you, then try switching off your news notifications on your phone or computer, or limit yourself to only checking the news homepage once a day at a designated time.  

8. Be honest when asked about how you are feeling 

Many of us tend to downplay or dismiss our emotions. It can be appealing to keep your feelings to yourself, saving yourself the energy involved in explaining the situation to whomever asked. You might also think that you are saving someone else the hassle of worrying about you. The trouble is that if we continuously paper over the cracks in our mental wellbeing within the handful of conversations we are having (which are especially limited during the pandemic), we could stop acknowledging how we truly feel. This does not necessarily mean spilling all the beans to whomever asked the innocent question, “How are you?”. But the catharsis from opening up to someone and acknowledging that things are not quite right could really offload some weight off your shoulders. If the person on the other end is your PhD supervisor, it can also be helpful for them to know that you are having a terrible time and are therefore unable to complete tasks to your best ability. Submission anxiety can be crippling for some people in the final few weeks, and your supervisor just won’t be able to (and shouldn’t) blindly assume how your mental health is being affected by it, because everyone experiences things differently. This goes back to bullet no.1. 

Hopefully it goes without saying that the above are simply some things that helped me through to the end of the thesis, but everybody is different. I am no counsellor or wellbeing guru; just a recently-finished PhD! Hopefully the above points might offer a little bit of light for anyone else struggling through the storm of that final write-up. Keep your chin up and, as Dory says: just keep swimming. Good luck! 

Better Data… with MetaData!

James Fallon – j.fallon@pgr.reading.ac.uk

As researchers, we familiarise ourselves with many different datasets. Depending on who put together the dataset, the variable names and definitions that we are already familiar from one dataset may be different in another. These differences can range from subtle annoyances to large structural differences, and it’s not always immediately obvious how best to handle them.

One dataset might be on an hourly time-index, and the other daily. The grid points which tell us the geographic location of data points may be spaced at different intervals, or use entirely different co-ordinate systems!

However most modern datasets come with hidden help in the form of metadata – this information should tell us how the data is to be used, and with the right choice of python modules we can use the metadata to automatically work with different datasets whilst avoiding conversion headaches.

First attempt…

Starting my PhD, my favourite (naïve, inefficient, bug prone,… ) method of reading data with python was with use of the built-in function open() or numpy functions like genfromtxt(). These are quick to set up, and can be good enough. But as soon as we are using data with more than one field, complex coordinates and calendar indexes, or more than one dataset, this line of programming becomes unwieldy and disorderly!

>>> header = np.genfromtxt(fname, delimiter=',', dtype='str', max_rows=1)
>>> print(header)
['Year' 'Month' 'Day' 'Electricity_Demand']
>>> data = np.genfromtxt(fnam, delimiter=',', skip_header=1)
>>> print(data)
array([[2.010e+03, 1.000e+00, 1.000e+00, 0.000e+00],
       [2.010e+03, 1.000e+00, 2.000e+00, 0.000e+00],
       [2.010e+03, 1.000e+00, 3.000e+00, 0.000e+00],
       ...,
       [2.015e+03, 1.200e+01, 2.900e+01, 5.850e+00],
       [2.015e+03, 1.200e+01, 3.000e+01, 6.090e+00],
       [2.015e+03, 1.200e+01, 3.100e+01, 6.040e+00]])

The above code reads in year, month, day data in the first 3 columns, and Electricity_Demand in the last column.

You might be familiar with such a workflow – perhaps you have refined it down to a fine art!

In many cases this is sufficient for what we need, but making use of already available metadata can make the data more readable, and easier to operate on when it comes to complicated collocation and statistics.

Enter pandas!

Pandas

In the previous example, we read in our data to numpy arrays. Numpy arrays are very useful, because they store data more efficiently than a regular python list, they are easier to index, and have many built in operations from simple addition to niche linear algebra techniques.

We stored column labels in an array called header, but this means our metadata has to be handled separately from our data. The dates are stored in three different columns alongside the data – but what if we want to perform an operation on just the data (for example add 5 to every value). It is technically possible but awkward and dangerous – if the column index changes in future our code might break! We are probably better splitting the dates into another separate array, but that means more work to record the column headers, and an increasing number of python variables to keep track of.

Using pandas, we can store all of this information in a single object, and using relevant datatypes:

>>> data = pd.read_csv(fname, parse_dates=[['Year', 'Month', 'Day']], index_col=0)
>>> data
Electricity_Demand
Year_Month_Day      
2010-01-01      0.00
2010-01-02      0.00
2010-01-03      0.00
2010-01-04      0.00
2010-01-05      0.00
...              ...
2015-12-27      5.70
2015-12-28      5.65
2015-12-29      5.85
2015-12-30      6.09
2015-12-31      6.04

[2191 rows x 1 columns]

This may not immediately appear a whole lot different to what we had earlier, but notice the dates are now saved in datetime format, whilst being tied to the data Electricity_Demand. If we want to index the data, we can simultaneously index the time-index without any further code (and possible mistakes leading to errors).

Pandas also makes it really simple to perform some complicated operations. In this example, I am only dealing with one field (Electricity_Demand), but this works with 10, 100, 1000 or more columns!

  • Flip columns with data.T
  • Calculate quantiles with data.quantile
  • Cut to between dates, eg. data.loc['2010-02-03':'2011-01-05']
  • Calculate 7-day rolling mean: data.rolling(7).mean()

We can insert new columns, remove old ones, change the index, perform complex slices, and all the metadata stays stuck to our data!

Whilst pandas does have many maths functions built in, if need-be we can also export directly to numpy using numpy.array(data['Electricity_Demand']) or data.to_numpy().

Pandas can also simplify plotting – particularly convenient when you just want to quickly visualise data without writing import matplotlib.pyplot as plt and other boilerplate code. In this example, I plot my data alongside its 7-day rolling mean:

ax = data.loc['2010'].plot(label='Demand', ylabel='Demand (GW)')
data.loc['2010'].rolling(7).mean().plot(ax=ax, label='Demand rolling mean')
ax.legend()

Now I can visualise the anomalous values at the start of the dataset, a consistent annual trend, a diurnal cycle, and fairly consistent behaviour week to week.

Big datasets

Pandas can read from and write to many different data formats – CSV, HTML, EXCEL, … but some filetypes like netCDF4 that meteorologists like working with aren’t built in.

xarray is an extremely versatile tool that can read in many formats including netCDF, GRIB. As well as having built in functions to export to pandas, xarray is completely capable of handling metadata on its own, and many researchers work directly with objects such as xarray DataArray objects.

There are more xarray features than stars in the universe[citation needed], but some that I find invaluable include:

open_mfdataset – automatically merge multiple files (eg. for different dates or locations)
assign_coords – replace one co-ordinate system with another
where – replace xarray values depending on a condition

Yes you can do all of this with pandas or numpy. But you can pass metadata attributes as arguments, for example we can get the latitude average with my_data.mean('latitude'). No need to work in indexes and hardcoded values – xarray can do all the heavy lifting for you!

Have more useful tips for working effectively with meteorological data? Leave a comment here or send me an email j.fallon@pgr.reading.ac.uk 🙂

The EGU Experience 2021: a PhD student perspective

Max Coleman – m.r.coleman@pgr.reading.ac.uk

Chloe Brimicombe – c.r.brimicombe@pgr.reading.ac.uk

The European Geoscience Union General Assembly is one of the big annual conferences for atmospheric science (and Earth sciences more generally). The two of us were fortunate to have the opportunity to attend and present our research at this year’s vEGU21 conference. As has been done in previous years like in 2019 we’re here to give you an account of our EGU experience 😀 (so you can compare our virtual experience with the previous posts if you like 😉) 

Entrance hall to virtual EGU (Source: Linda Speight) 

What was vEGU21? 

EGUv21 was the general assembly for 2021 online. It took place from the 19th to the 30th April EGU. Through an impressive virtual conference center and mostly Zoom. 

What was your presentation on? 

Chloe –  I presented borderless heat stress in the extreme heat events session, which is based on a paper currently under review at Earth’s Future, where we show that heat stress is growing in the area during the month of August. The invited speaker to the session was Laura Suarez-Gutierrez and it was a great presentation on the dynamics of increasing heat extremes with climate change across Europe. I really enjoyed learning about the latest research in the extreme heat area. 

Max – I presented on my work using model nudging to study aerosol radiative adjustments. I presented in the session ‘Chemistry, Aerosols and Radiative Forcing in CMIP6-era models’, which was convened and hosted by Reading’s very own Bill Collins. There were many interesting presentations in this session, including presentations on the balance between climate and air quality benefits by Robert Allen and Steve Turnock; a summary of the Aerosol Chemistry Model Intercomparison Project (AerChemMIP) findings by UoR’s Gill Thornhill; and a personal favourite concerned the impacts of different emissions pathways in Africa on local and global climate, and local air pollution effects on mortality, presented by Chris Wells. 

Chloe presenting: would win an award for most interesting screenshot. (Source: Maureen Wanzala) 

What were your favourite aspects of the conference? 

Chloe – Apart from my session one of my favorite’s was on climate services. This focused on the application of meteorological and hydrology data to services for example health heat impacts and growing grapes and olives. I also enjoyed the panel on the climate and ecological emergency in light of COVID-19 including Katherine Hayhoe and the session on equality, diversity and inclusion; it was interesting how ‘listening’ to those impacted was an overlapping theme in these. The weirdest, loveliest experience was my main supervisor sending me a colouring page of her face

Max – As with any conference it was a great opportunity to learn about the latest research in my specific field, as well as learning about exciting developments in other fields, from machine learning applications in earth science to observational studies of methane emissions. Particularly, it’s a nice change from just reading about them in papers.Having conversations with presenters gives you the opportunity to really dive in and find out what motivated their research initially and discuss future applications. For example, one conversation I had went from discussing their application of unsupervised machine learning in classifying profiles of earth system model output, to learning about it’s potential for use in model intercomparisons.  

Katherine Hayhoe in the session Climate and Ecological Emergency: can a pandemic help save us? (Source: Chloe Brimicombe) 

What was your least favourite aspect? 

Chloe – I did manage to do a little networking. But I’d love to experience an in person conference where I present. I have never presented my research in real life at a conference or research group/department seminar 😱. We also miss out on a lot of free food and pens not going to any in life conferences, which is what research is about 😉. Also, I find it difficult to stay focused on the conference when it’s online.  

Max – For me the structure of two minute summaries followed by breakout Zoom rooms for each speaker had some definite drawbacks. For topics outside one’s own field, I found it difficult to really learn much from many of the summaries – it’s not easy to fit something interesting for experts and non-experts into two minutes! In theory you can go speak to presenters in their breakout rooms, but there’s something awkward about entering a zoom breakout room with just you and the presenter, particularly when you aren’t sure exactly how well you understood their two minute summary.  

In light of your vEGU21 experience, what are your thoughts on remote vs traditional conferencing? 

Max – Overall I think virtual conferencing has a way to go before it can match up to the in person experience. There were the classic technical issues of anything hosted remotely: the ‘I think you’re on mute’ experience, other microphone issues, and even the conference website crashing on the first day of scientific sessions (though the organisers did a swift job getting the conference back up and running). But there’s also the less obvious, such as it feeling actually quite a lonely experience. I’ve only been to a couple of in-person conferences, but there were always some people I knew and could meet up with. But it’s challenging to recreate this online, especially for early career researchers who don’t have as many established connections, and particularly at a big conference like the EGU general assembly. Perhaps a big social media presence can somewhat replace this, but not everyone (including myself!) is a big social media user. .  

On the other hand, it’s great that we can still have conferences during a global pandemic, and no doubt is better than an absence of them entirely. Above all else, it’s also much greener and more accessible to those with less available funding for conference travel (though new challenges of accessibility, such as internet quality and access, undoubtedly arise). Plus, the facility to upload various display materials and people to look back at them whenever they like, regardless of time zones, is handy.  

Chloe – I’d just add, as great as Twitter is and can be for promoting your research, it’s not the same as going for a good old cup of tea (or cocktail) with someone. Also, you can have the biggest brightest social media, but actually be terrible at conveying your research in person. 

Summary 

Overall it was interesting to take part in vEGU21, and we were both glad we went. It didn’t quite live up to the in person experience – and there is definitely room for improvements for virtual conferencing – but it’s great we can still have these experiences, albeit online.  

Coding lessons for the newly initiated

Better coding skills and tooling enable faster, more useful results. 

Daniel Ayers – d.ayers@pgr.reading.ac.uk

This post presents a collection of resources and tips that have been most useful to me in the first 18 months I’ve been coding – when I arrived at Reading, my coding ability amounted to using excel formulas. These days, I spend a lot of time coding experiments that test how well machine learning algorithms can provide information on error growth in low-dimensional dynamical systems. This requires fairly heavy use of Scikit-learn, Tensorflow and Pandas. This post would have been optimally useful at the start of the year, but perhaps even the coding veterans will find something of use – or better, they can tell me about something I am yet to discover!  

First Steps: a few useful references 

  • A byte of python. A useful and concise reference for the fundamentals. 
  • Python Crash Course, Eric Matthes (2019). Detailed, lots of examples, and covers a wider range of topics (including, for example, using git). There are many intro to Python books around; this one has certainly been useful to me.1 There are many good online resources for python, but it can be helpful initially to have a coherent guide in one place. 

How did I do that last time? 

Tip: save snippets. 

There are often small bits of code that contain key tricks that we use only occasionally. Sometimes it takes a bit of time reading forums or documentation to figure out these tricks. It’s a pain to have to do the legwork again to find the trick a second or third time. There were numerous occasions when I knew I’d worked out how to do something previously, and then spent precious minutes trawling through various bits of code and coursework to find the line where I’d done it. Then I found a better solution: I started saving snippets with an online note taking tool called Supernotes. Here’s an example:  

I often find myself searching through my code snippets to remind myself of things. 

Text editors, IDEs and plugins. 

If you haven’t already, it might be worth trying some different options when it comes to your text editor or IDE. I’ve met many people who swear by PyCharm. Personally, I’ve been getting on well with Visual Studio Code (VS Code) for a year now. 

Either way, I also recommend spending some time installing useful plugins as these can make your life easier. My recommendations for VS Code plugins are: Hungry Delete, Rainbow CSV, LaTeX Workshop, Bracket Pair Colorizer 2, Rewrap and Todo Tree

Linters & formatters 

Linters and formatters check your code for syntax errors or style errors. I use the Black formatter, and have it set to run every time I save my file. This seems to save a lot of time, and not only with formatting: it becomes more obvious when I have used incorrect syntax or made a typo. It also makes my code easier to read and look nicer. Here’s an example of Black in anger:  

Some other options for linters and formatters include autopep, yapf and pylint. 

Metadata for results 

Data needs metadata in order to be understood. Does your workflow enable you to understand your data? I tend to work with toy models, so my current approach is to make a new directory for each version of my experiment code. This way I can make notes for each version of the experiment (usually in a markdown file). In other words, what not to do, is to run the code to generate results and then edit the code (excepting, of course, if your code has a bug). At a later stage you may want to understand how your results were calculated, and this cannot be done if you’ve changed the code file since the data was generated (unless you are a git wizard). 

A bigger toolbox makes you a more powerful coder 

Knowing about the right tool for the job can make life much easier.2 There are many excellent Python packages, and the more you explore, the more likely you’ll know of something that can help you. A good resource for the modules of the Python 3 standard library is Python Module of The Week. Some favourite packages of mine are Pandas (for processing data) and Seaborn (a wrapper on Matplotlib that enables quick and fancy plotting of data). Both are well worth the time spent learning to use them. 

Some thoughts on Matplotlib 

Frankly some of the most frustrating experiences in my early days with python was trying to plot things with Matplotlib. At times it seemed inanely tedious, and bizarrely difficult to achieve what I wanted given how capable a tool others made it seem. My tips for the uninitiated would be: 

  • Be a minimalist, never a perfectionist. I often managed to spend 80% of my time plotting trying to achieve one obscure change. Ask: Do I really need this bit of the plot to get my point across? 
  • Can you hack it, i.e. can you fix up the plot using something other than Matplotlib? For example, you might spend ages trying to tell Matplotlib to get some spacing right, when for your current purpose you could get the same result by editing the plot in word/pages in a few clicks. 
  • Be patient. I promise, it gets easier with time. 

Object oriented programming 

I’m curious to know how many of us in the meteorology department code with classes. In simple projects, it is possible to do without classes. That said, there’s a reason classes are a fundamental of modern programming: they enable more elegant and effective problem solving, code structure and testing. As Hans Petter Langtangen states in A Primer on Scientific Programming with Python, “classes often provide better solutions to programming problems.”  

What’s more, if you understand classes and object- oriented programming concepts then understanding others’ code is much easier. For example, it can make Matplotlib’s documentation easier to understand and, in the worse caseworst case scenario, if you had to read the Matplotlib source code to understand what was going on under the hood, it will make much more sense if you know how classes work. As with Pandas, classes are worth the time buy in! 

Have any suggestions or other useful resources for wannabe pythonistas? Please comment below or email me at d.ayers@pgr.reading.ac.uk. 

Extra conference funding: how to apply and where to look

Shannon Jones – s.jones2@pgr.reading.ac.uk

The current PhD travel budget of £2000 doesn’t go far, especially if you have your eye on attending the AGU Fall Meeting in San Francisco. If the world ever goes back to normal (and fingers crossed it will – though hopefully with more greener travel options, and remote participation in shorter conferences?) you might wonder how you are ever going to afford the conferences your supervisors suggest. Luckily, there are many ways you can supplement your budget. Receiving travel grants not only means more conferences (and more travel!), but it also looks great on your CV. In this blog post I share what I have learnt about applying for conference grants and list the main places to apply.

Sources of funding include…

Graduate School Travel Support Scheme

  • Open to 2nd and 3rd year PhD students at the university (or equivalent year if part-time) 
  • 1 payment per student of up to £200 
  • Usually 3 deadlines throughout the year 

There are two schemes open to all PhD students who are members of the IOP (any PhD student who has a degree in physics or a related subject can apply to become a member)

Research Student Conference Fund

  • Unlimited payments until you have received £300 in total
  • 4 deadlines throughout the year: 1st March, 1st June, 1st September and 1st December 
  • Note: you apply for funding from an IOP group, and the conference must be relevant to the group. For example, most meteorology PhD students would apply for conference funding from the Environmental Physics group. You get to choose which groups to join when you become an IOP member. 

CR Barber Trust

  • 1 payment per student of £100-£300 for an international conference depending on the conference location 
  • Apply anytime as long as there is more than a month before the proposed conference 

Legacies Fund

Conference/Meeting Travel Subsistence

From the conference organiser

  • Finally, many conferences offer their own student support, so it’s always worth checking the conference website to see 
  • Both EGU and AGU offer grants to attend their meetings each year 

Application Tips

Apply early!!!

Many of these schemes take months to let you know whether you have been successful. Becoming a member can also take a while, especially when societies only approve new members at certain times of the year. So, it’s good to talk to your supervisor and make a conference plan early on in your PhD, so you know when to apply. 

Writing your application

Generally, these organisations are keen to give away their funds, you just have to write a good enough application. Keep it simple and short: remember the person reading the application is very unlikely to be an expert in your research. It can be helpful to ask someone who isn’t a scientist (or doesn’t know your work well) to read it and highlight anything that doesn’t make sense to them. 

Estimating your conference expenses

You are usually expected to provide a breakdown of the conference costs with every application. The main costs to account for are: 

  • Accommodation: for non-UK stays must apply for a quote through the university travel agent 
  • Travel: UK train tickets over £100 and all international travel must be booked by university too 
  • Subsistence: i.e. food! University rules used to say this could be a maximum of £30 per day – check current guidelines 
  • Conference Fees: the conference website will usually list this 

The total cost will depend on where the conference is. You are generally expected to choose cheaper options, but there is some flexibility. As a rough guide: a 4-day conference within the UK cost me around £400 (in 2019) and a 5-night stay in San Francisco to attend AGU cost me around £2200 (in 2019).  

Reading PhD students at Union Square, San Francisco for AGU! 

Good luck! Feel free to drop me an email at s.jones2@pgr.reading.ac.uk if you have any questions 😊