← All Events
Date TBD TBD Online

Parquet Files: What They Are and When to Use Them Instead of CSV

CSVs are everywhere, but they're slow, bloated, and lossy. Parquet is the format that data engineers actually use — and it's easier to work with than you think.

ParquetData EngineeringPython

You’ve been emailing CSVs around. You’ve waited for a 2GB file to load into a spreadsheet. You’ve watched column types get mangled every time you reopen a file.

There’s a better way. It’s called Parquet, and once you understand it, you’ll wonder why anyone still uses CSV for anything serious.

What’s Wrong with CSV

  • Every time you open it, the program has to guess what your columns mean
  • Dates become strings. Numbers become text. Booleans become chaos
  • A 500MB CSV might be 50MB as Parquet — same data, fraction of the size
  • Want to read just one column out of fifty? Too bad — CSV makes you load the whole thing

What Makes Parquet Different

Parquet is a columnar file format. Instead of storing data row by row (like CSV), it stores it column by column. That means:

  • It remembers your types — integers stay integers, dates stay dates
  • It compresses beautifully — often 5-10x smaller than equivalent CSVs
  • It’s fast to query — tools can read just the columns they need without touching the rest
  • It’s the standard — Spark, BigQuery, Snowflake, DuckDB, Pandas — everything speaks Parquet

What We’ll Do

  • Compare loading the same dataset as CSV vs Parquet and see the difference
  • Convert a messy CSV into a clean Parquet file
  • Query a Parquet file directly with DuckDB and Python
  • Explore a multi-gigabyte dataset that would be painful as CSV but is instant as Parquet
  • Talk about when to use Parquet, when CSV is fine, and how to make the switch

Who This Is For

Anyone who works with data files. If you’ve ever emailed a CSV, downloaded a spreadsheet export, or waited too long for a file to load — this is for you. Some Python familiarity helps but isn’t required.

Date and time coming soon — join the Meetup group to get notified.