Introduction to Xarray and Dask for Geoscientists

Overview

Instructor: Abishek Gopal

Time: Friday, April 23, 2021 — 10:00AM-12:30PM CT

Location: Zoom session only

Prerequisites: Current Terra account. Basic experience with Python and Jupyter notebooks recommended.

Dask lazy loading and parallelization potentially allows researchers to scale their computations from their laptops to supercomputing clusters. We will use model output from CESM, in the netCDF format, to generate climatology and time-averages, visualize quantities, resample data, perform interpolations in the vertical coordinates, and look at how to parallelize xarray-based computations using the dask library.

Course Materials

Presentation slides

The presentation slides are available as downloadable PDF files.

  • Introduction to xarray and dask (Spring 2021): PDF

Agenda

  • Brief introduction to the Pangeo framework
  • Data structures in xarray
  • Reading and writing netCDF files using xarray
  • Dask chunking and lazy loading
  • Computations available through xarray and xgcm
  • Visualizing xarray DataArrays with cartopy
  • Explicitly parallelizing computations using dask
  • Using the dask dashboard to understand memory usage