
Member-only story
Data Science, Editorial, Machine Learning
Best Public Datasets for Machine Learning and Data Science
Best public datasets for machine learning, data science, sentiment analysis, computer vision, natural language processing (NLP), clinical data, and others.
Author(s): Stacy Stanford, Roberto Iriondo, Pratik Shukla
Last updated January 6, 2021
This resource is continuously updated. If you know of any other suitable and open datasets, please let us know by emailing us at pub@towardsai.net or by dropping a comment below.
Dataset Finders
Google Dataset Search: Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they are hosted, whether it’s a publisher’s site, a digital library, or an author’s web page. It’s a phenomenal dataset finder, and it contains over 25 million datasets.
Kaggle: Kaggle provides a vast container of datasets, sufficient for the enthusiast to the expert.
UCI Machine Learning Repository: The Machine Learning Repository at UCI provides an up to date resource for open-source datasets.
VisualData: Discover computer vision datasets by category; it allows searchable queries.
CMU Libraries: Discover high-quality datasets thanks to the collection of Huajin Wang, at CMU.
The Big Bad NLP Database: This cool dataset list contains datasets for various natural language processing tasks, created and curated by Quantum Stat.
📚 Check out the Monte Carlo Simulation An In-depth Tutorial with Python. 📚
General Datasets
Housing Datasets
Boston Housing Dataset: Contains information collected by the US Census Service concerning…