Why Pandas-like Interfaces are Sub-optimal for Distributed Computing | by Kevin Kho | Jun, 2022
A deep look at the assumptions of the Pandas interfaceWritten by Kevin Kho and Han WangThis is a written version of our most recent PyCon talk.Photo by Jukan Tateisi on UnsplashOver the last year and a half, we’ve talked to data practitioners who want to move Pandas code to either Dask or Spark to take advantage of distributed computing resources. Their workloads were quickly becoming too compute-intense or their datasets would not fit in Pandas anymore, which only runs on a single machine.One of the recurring themes in…