Choosing the right data file format for numerical computing

By David Hoese on 2019-10-10

This talk will go over the pros and cons of various data file formats common in scientific python workflows. We'll cover various concerns when storing data on-disk and how popular formats address these challenges. The file formats covered will include CSV, flat binary, HDF5, NetCDF4, Parquet/Arrow, and Zarr.

Materials from this talk can be found on GitHub and a live version of the notebook presented is available here.

David Hoese is a software developer at the Space Science and Engineering Center at the University of Wisconsin-Madison. He graduated with a Bachelor's degree in Computer Engineering from UW-Madison. David works on writing software tools to assist atmospheric scientists with a focus on analyzing satellite and ground-based instrument data.