Tutorial For Working With NetCDF Climate Data In Python

There's a massive amount of climate data out there, and it's not always easy to find and understand. I figured I'd make a quick tutorial for anyone who wants to start working with it. This also serves as a tutorial for reading data from netcdf files in Python.


  • Open a file with climate data
  • Make a plot of some of it that's presented in a meaningful way


The data are typically stored in netcdf files (.nc). There are a lot of tools for working with these files and I personally use MATLAB primarily on this site, but it's not as widely used as other tools, so I'll do this tutorial in Python.

Disclaimer: Python is not my preferred language so don't default to my code style if you are a competent Python developer as it's probably bad

In case you aren't already setup with a Python environment, you can use Anaconda. I'm using Python 2.7 for this, and the basic process you'll need to go through is:

  • install Anaconda
  • from there, run 'conda install netcdf4' to install the default netcdf reader
  • run 'conda update ipython'
  • run IPython on your machine


There are a lot of sources. I list many here, and for this tutorial, I'm using the TMAX data for 2000 - 2009 under the 'Daily Land (Experimental: 1880-Recent)' section in http://berkeleyearth.org/data/.


If you set up your Python environment, you should be able to go. NetCDF files group the data into variables of various sizes and names with various attributes. To start with, you can open a reference to read one and print out the variables available with the following code:

from netCDF4 import Dataset import matplotlib.pyplot as plt #open reference to the file and see overview of it data = Dataset("PUT YOUR FILE PATH HERE", "r"); print(data.variables.items())

Then, it's kind of up to you what you want to do with it. I have a lot of examples of analyses with them on this site. For this, we can try to make a map with the max temperatures. To start with then, we should pull out our data. Looking at what was printed with the code above, we pick out the following variables:
  • longitude
  • latitude
  • temperature
  • climatology
Combining the last two gives us the actual temperature (first is anomaly from reference and second is reference). The code below does this for us:

#pull the data from file into variables lon = data.variables['longitude'][:]; lat = data.variables['latitude'][:]; anomaly = data.variables['temperature'][:]; reference = data.variables['climatology'][:]; #grab January 31, 2000 data and convert from C to F data = anomaly[31, :, :] + reference[31, :, :]; data = (1.8*data) + 32;

Next, we want to plot the data. A reasonable way to plot this sort of thing is to use the contourf plot in matplotlib. We simply use the following to generate a plot of global max temperature on the 31st day of 2000 (January 31st):

#plot global max temp plt.figure(1) plt.contourf(lon, lat, data) plt.colorbar() plt.ylabel('latitude') plt.xlabel('longitude') plt.title('Max Temp (F) on January 31, 2000') plt.show()

plot map surface with python

The temperature spread is so massive that the plot doesn't show much detail, so we can plot the same thing for the US by selecting the appropriate latitudes and longitudes:

#plot US max temp plt.figure(2) plt.contourf(lon[50:125], lat[115:135], data[115:135, 50:125]) plt.colorbar() plt.ylabel('latitude') plt.xlabel('longitude') plt.title('Max Temp (F) on January 31, 2000') plt.show()

python surface climate map

Note that we get much cooler temperatures in the northern hemisphere and at the poles like we'd expect.


That's it...this is enough to get started working with climate data. The link I provided earlier has links to future projections from multiple groups and data for other variables (min temperature, precipitation, etc.). For reference, the source code for this tutorial can be downloaded here:


As a note...there is also a free coursera course on basic climate modeling with python that might interest you and it is here:


No comments:

Post a Comment