5 Public Datasets to Practice With
One of the best ways to learn data analysis is to work with data you find genuinely interesting. Here are five public datasets that are great for practice.
1. NYC Open Data
The City of New York publishes thousands of datasets covering everything from 311 complaints to restaurant inspection grades. It’s local, it’s relevant, and it’s massive.
Good for: SQL practice, exploratory analysis, geospatial data
2. Kaggle Datasets
Kaggle hosts thousands of community-uploaded datasets on every topic imaginable, from Spotify listening history to global climate data. Many come with starter notebooks.
Good for: Guided exploration, machine learning practice
3. US Census Bureau
Demographics, economics, housing — the Census Bureau is one of the richest data sources available. The American Community Survey alone has hundreds of variables.
Good for: Statistical analysis, demographic research, data joins
4. FiveThirtyEight Data
The journalism site publishes the data behind their articles on GitHub. Sports, politics, economics — all cleaned and ready to analyze.
Good for: Reproducible analysis, learning how journalists use data
5. World Bank Open Data
Global development indicators across every country: GDP, literacy rates, life expectancy, and more. Great for time-series analysis and international comparisons.
Good for: Time-series analysis, visualization, international comparisons
Pick one that interests you, download it, and start asking questions. That’s how the learning happens. And if you want to do it with other people, join us at a workshop.