Where to find practice datasets

Five data repositories to boost your practice

6/2/20252 min read

A laptop displaying a line graph with various peaks and troughs on its screen, indicating stock or financial data. The graph and numbers are green against a black background. The laptop is placed on a wooden table, and beside it, a person's arm is partially visible.
A laptop displaying a line graph with various peaks and troughs on its screen, indicating stock or financial data. The graph and numbers are green against a black background. The laptop is placed on a wooden table, and beside it, a person's arm is partially visible.

Discover real datasets to enhance your R learning journey

When learning R it's good to to use "real" datasets to apply what you've learned. It makes the learning more interesting and more engaging - look at space data for astronomy enthusiasts, move data for film buffs or even flower data for gardening enthusiasts. Here are some go-to data repositories that will give you plenty of data to explore and practice new functions on.

1.Built-in R Datasets

Run data() in R to see what datasets are built-in. This includes a wide variety of datasets from susn spot data to entries to UC Berkeley. It includes time series data as well as static data so it's pretty rich for analysis.

You often see these datasets being used to demonstrate the use of functions. The "iris" dataset seems to be used frequently, but I think it's a pretty dull dataset compared to what's available elsewhere!

2.Tidy Tuesday

Website: https://github.com/rfordatascience/tidytuesday

Every Tuesday a dataset is released with a different topic. It's a heavily used data repository and you can find lots of YouTube tutorials to follow along as people go through some of the datasets.

For actuaries: If you want to practice some survival analysis, then have a look at the "Alone data" from January 2023. For practice on life expectancy calculations and modeling, check out the "Life Expectancy" dataset from December 2023.

3. Kaggle Datasets

Website: https://www.kaggle.com/datasets

Kaggle hosts thousands of datasets that are regularly updated by the community. You'll find everything from government data and sports statistics to medical research and financial data. Many datasets come with code examples and notebooks to help you get started, making it perfect for R practice.

4. Awesome Public Datasets

Website: https://github.com/awesomedata/awesome-public-datasets

The title pretty much says everything! This repository contains lots of different topics and lots of different datasets. It's a comprehensive collection that covers virtually every domain you can think of.

5. Google Dataset Search

Website: https://datasetsearch.research.google.com/

Finally... if none of these pique your interest, then you can always use Google Dataset Search to find something else. It's like Google, but specifically for finding datasets across the web.

Happy coding and data exploring! 📊📈

Additional Tips for Using These Repositories:

  • Start small: Begin with datasets that have fewer than 10,000 rows while learning

  • Read the documentation: Most repositories provide data dictionaries explaining what each column means

  • Join communities: Many of these platforms have active communities where you can ask questions

  • Practice regularly: Try to work with a new dataset at least once a week

  • Share your work: Post your analyses on platforms like GitHub or LinkedIn to build your portfolio

Why Real Data Matters:

Working with real datasets is crucial for developing practical R skills because:

  • Messy data: Real data often needs cleaning, which teaches you important data wrangling skills

  • Context matters: Understanding the domain helps you ask better analytical questions

  • Practical constraints: Real datasets teach you to work within limitations and make assumptions

  • Portfolio building: Real analyses are more impressive to potential employers than textbook examples

The best way to learn R is by doing. Pick a dataset that interests you, start exploring, and don't be afraid to make mistakes – that's how you move forward.