EDA Exercises in R: 50 Real Practice Problems
Fifty exploratory data analysis exercises spanning inspection, distributions, missing values, outliers, relationships, and full EDA workflows. Hidden solutions, runnable code.
Section 1. Data inspection (8 problems)
Exercise 1.1: Dimensions
Difficulty: Beginner. Get the row and column count of airquality.
Show solution
Exercise 1.2: Glimpse columns
Difficulty: Beginner. Inspect column types and sample values of mtcars with glimpse.
Show solution
Exercise 1.3: Summary statistics
Difficulty: Beginner. Print summary of airquality and identify the column with the most NAs.
Show solution
Exercise 1.4: First 10 rows
Difficulty: Beginner. Inspect the first 10 rows of diamonds.
Show solution
Exercise 1.5: Class of each column
Difficulty: Intermediate. Get the class of each column of iris as a named vector.
Show solution
Exercise 1.6: Number of distinct values per column
Difficulty: Intermediate. For diamonds, count distinct values per column.
Show solution
Exercise 1.7: Range of each numeric column
Difficulty: Intermediate. Get min and max for each numeric column of iris.
Show solution
Exercise 1.8: Build a one-shot data profile
Difficulty: Advanced. For airquality, return a tibble with column name, class, NA count, distinct count.
Show solution
Section 2. Distributions (10 problems)
Exercise 2.1: Histogram of mpg
Difficulty: Beginner. Histogram of mtcars$mpg with 15 bins.
Show solution
Exercise 2.2: Density curve
Difficulty: Beginner. Density curve of diamonds$price.
Show solution
Exercise 2.3: Boxplot per group
Difficulty: Intermediate. Boxplot of Sepal.Length by Species.
Show solution
Exercise 2.4: Overlapping densities
Difficulty: Intermediate. Overlapping density plot of Sepal.Length, colored by Species, with alpha.
Show solution
Exercise 2.5: Quintiles
Difficulty: Intermediate. Compute the 20th, 40th, 60th, 80th percentiles of mtcars$mpg.
Show solution
Exercise 2.6: Skewness and kurtosis
Difficulty: Advanced. Compute skewness and kurtosis of diamonds$price.
Show solution
Exercise 2.7: Log-transform a skewed variable
Difficulty: Intermediate. Plot log(price) histogram and observe the difference.
Show solution
Exercise 2.8: Histograms by facet
Difficulty: Intermediate. Histogram of price faceted by cut.
Show solution
Exercise 2.9: Empirical CDF
Difficulty: Advanced. Plot the empirical CDF of mtcars$mpg.
Show solution
Exercise 2.10: Compare distribution shape across groups
Difficulty: Advanced. Use ridgeline plots (ggridges) for diamond price by cut.
Show solution
Section 3. Missing data (6 problems)
Exercise 3.1: Count NAs
Difficulty: Beginner. Total NA count in airquality.
Show solution
Exercise 3.2: NA per column
Difficulty: Intermediate. NAs per column, sorted desc.
Show solution
Exercise 3.3: NA per row
Difficulty: Intermediate. Add a n_na column per row to airquality.
Show solution
Exercise 3.4: Drop incomplete rows
Difficulty: Beginner. Remove rows with any NA.
Show solution
Exercise 3.5: Visualize NA pattern
Difficulty: Advanced. Use naniar::vis_miss to visualize the missingness pattern.
Show solution
Exercise 3.6: Mean impute and document
Difficulty: Intermediate. Impute Ozone NAs with the column mean and add a flag column.
Show solution
Section 4. Outliers (6 problems)
Exercise 4.1: Tukey IQR rule
Difficulty: Intermediate. Flag mpg outliers using Q1 - 1.5IQR / Q3 + 1.5IQR.
Show solution
Exercise 4.2: Z-score rule
Difficulty: Intermediate. Flag rows where |z| > 3 for mpg.
Show solution
Exercise 4.3: Per-group outliers
Difficulty: Advanced. Flag mpg outliers within each cyl group.
Show solution
Exercise 4.4: Visualize outliers in a boxplot
Difficulty: Beginner. Boxplot of diamonds$price.
Show solution
Exercise 4.5: Winsorize
Difficulty: Intermediate. Cap mpg at the 5th and 95th percentiles.
Show solution
Exercise 4.6: Robust scale alternative
Difficulty: Advanced. Standardize using median + MAD instead of mean + sd.
Show solution
Section 5. Relationships (10 problems)
Exercise 5.1: Pearson correlation
Difficulty: Beginner. Correlation between wt and mpg.
Show solution
Exercise 5.2: Correlation matrix
Difficulty: Intermediate. Correlation matrix of mtcars (numeric).
Show solution
Exercise 5.3: Visualize correlation matrix
Difficulty: Intermediate. Heatmap of the correlation matrix.
Show solution
Exercise 5.4: Spearman vs Pearson
Difficulty: Intermediate. Compare Pearson and Spearman correlation between disp and mpg.
Show solution
Exercise 5.5: Scatter with smoother
Difficulty: Intermediate. Scatter wt vs mpg with linear smoother.
Show solution
Exercise 5.6: Pairs plot
Difficulty: Intermediate. Pairs plot of iris numeric columns colored by Species.
Show solution
Exercise 5.7: Categorical-categorical
Difficulty: Intermediate. Cross-tabulation of cut and clarity in diamonds.
Show solution
Exercise 5.8: Categorical-numeric
Difficulty: Intermediate. Mean price per cut (categorical-numeric exploration).
Show solution
Exercise 5.9: Conditional density
Difficulty: Advanced. Density of mpg conditional on factor(cyl).
Show solution
Exercise 5.10: Mosaic plot
Difficulty: Advanced. Mosaic plot of cut x clarity proportions.
Show solution
Section 6. End-to-end EDA (10 problems)
Exercise 6.1: Initial profile
Difficulty: Intermediate. Run a 3-step opening EDA on diamonds: dim, glimpse, summary.
Show solution
Exercise 6.2: Find a categorical with imbalanced frequencies
Difficulty: Intermediate. Identify any column where the most frequent value is > 50% of rows.
Show solution
Exercise 6.3: Detect a heavily-skewed numeric
Difficulty: Advanced. Find numeric columns with skewness > 1.
Show solution
Exercise 6.4: Numeric summary by group
Difficulty: Intermediate. Per Species, give n, mean, sd, min, max of Sepal.Length.
Show solution
Exercise 6.5: Top correlations
Difficulty: Advanced. Find the top 3 most-correlated pairs in mtcars (excluding self).
Show solution
Exercise 6.6: Detect duplicates
Difficulty: Intermediate. Count fully-duplicate rows in diamonds.
Show solution
Exercise 6.7: One-way summary
Difficulty: Intermediate. Mean and N per cyl group with arrange.
Show solution
Exercise 6.8: Two-way summary
Difficulty: Intermediate. Mean price per (cut, color) in diamonds.
Show solution
Exercise 6.9: Audit sparse columns
Difficulty: Advanced. List columns where >25% of rows are NA in airquality.
Show solution
Exercise 6.10: Decision-quality EDA report
Difficulty: Advanced. Build a one-page EDA: profile + 3 plots (univariate hist, group boxplot, correlation heatmap).
Show solution
What to do next
After 50 EDA problems you should walk into a new dataset and have a profile in 5 minutes. Natural follow-ups:
- Data-Wrangling-Exercises (shipped), the cleaning that EDA reveals.
- Linear-Regression-Exercises (shipped), the modeling that EDA precedes.
- Data-Visualization-Exercises (coming), viz beyond the EDA basics.