Introduction to Xarray and Dask for Geoscientists
Overview
Instructor(s): Abishek Gopal
Time: Friday, April 23, 2021 10:00AM-12:30PM CT
Location: Zoom session only
Prerequisite(s): Current Terra account. Basic experience with Python and Jupyter notebooks recommended.
Dask lazy loading and parallelization potentially allows researchers to scale their computations from their laptops to supercomputing clusters. We will use model output from CESM, in the netCDF format, to generate climatology and time-averages, visualize quantities, resample data, perform interpolations in the vertical coordinates, and look at how to parallelize xarray-based computations using the dask library.
Course Materials
Presentation slides
The presentation slides are available as downloadable PDF files.
- Introduction to xarray and dask (Spring 2021): PDF
Agenda
- Brief introduction to the Pangeo framework
- Data structures in xarray
- Reading and writing netCDF files using xarray
- Dask chunking and lazy loading
- Computations available through xarray and xgcm
- Visualizing xarray DataArrays with cartopy
- Explicitly parallelizing computations using dask
- Using the dask dashboard to understand memory usage