R Cheat Sheet: 200 Functions Across dplyr, ggplot2, Stats, Printable
This R cheat sheet lists the 200 most-used functions across base R, dplyr, ggplot2, statistics, strings, and dates, each with a one-line description and a runnable example you can try right in your browser.
Every code block on this page runs live. Edit the values, hit Run, and watch the output update. No setup, no install, the tables are your index, the code is your playground.
Which base R functions should I know by heart?
You didn't come here to read, you came to look something up. So let's open with the one pattern that covers 80% of real R work: load a built-in dataset, pick a few rows, compute a summary. Every function used below appears in the tables further down, but seeing them together first builds the mental model that makes the rest of this page easier to scan.
Six cars in mtcars clear 25 mpg, and their horsepower sits mostly between 62 and 79, the one outlier at 113 hp is the Lotus Europa. That tiny snippet used five base R functions: subsetting with [ ], comparison with >, column selection by name, nrow(), and summary(). Those five are load-bearing in every R session you'll ever write.
Vectors and sequences
| Function | Description | Example |
|---|---|---|
c() |
Combine values into a vector | c(1, 2, 3) |
seq() |
Generate a sequence | seq(1, 10, by = 2) |
seq_len() |
Sequence from 1 to n | seq_len(5) |
seq_along() |
Sequence along an object | seq_along(letters) |
rep() |
Repeat elements | rep(1:3, times = 2) |
length() |
Number of elements | length(1:10) |
rev() |
Reverse a vector | rev(1:5) |
sort() |
Sort ascending/descending | sort(c(3,1,2)) |
order() |
Sort indices | order(c(3,1,2)) |
rank() |
Rank elements | rank(c(3,1,2)) |
unique() |
Remove duplicates | unique(c(1,1,2,3)) |
duplicated() |
Duplicate positions | duplicated(c(1,1,2)) |
table() |
Frequency table | table(c("a","b","a")) |
which() |
Indices where TRUE | which(c(T,F,T)) |
any() / all() |
Logical folds | any(c(T,F,F)) |
%in% |
Membership test | 3 %in% 1:5 |
head() / tail() |
First/last n | head(1:100, 5) |
append() |
Insert into vector | append(1:3, 99, after = 2) |
Math and logic
| Function | Description | Example | |
|---|---|---|---|
sum(), prod() |
Sum / product | sum(1:10) |
|
cumsum(), cumprod() |
Cumulative sum/product | cumsum(1:5) |
|
mean(), median() |
Average, middle value | mean(1:10) |
|
min(), max(), range() |
Extremes | range(1:10) |
|
abs(), sign() |
Magnitude, sign | abs(-5) |
|
sqrt(), exp(), log() |
Roots, exponentials, logs | log(exp(1)) |
|
round(), ceiling(), floor(), trunc() |
Rounding | round(3.14, 1) |
|
factorial(), choose() |
Combinatorics | choose(5, 2) |
|
ifelse() |
Vectorized if-else | ifelse(1:5 > 3, "hi", "lo") |
|
&, `\ |
, !, xor()` |
Logical operators | TRUE & FALSE |
Type checking and conversion
| Function | Description | Example |
|---|---|---|
class() |
Object class | class(1:5) |
typeof() |
Internal storage type | typeof(1L) |
is.numeric() / is.character() / is.logical() |
Type tests | is.numeric(3.14) |
is.factor() / is.list() / is.data.frame() |
Structure tests | is.data.frame(mtcars) |
is.null() / is.na() |
Missing/empty tests | is.na(c(1, NA)) |
as.numeric() / as.character() / as.logical() |
Convert type | as.numeric("42") |
as.factor() / as.integer() / as.list() |
Convert structure | as.integer(3.7) |
unlist() |
Flatten list to vector | unlist(list(1:2, 3:4)) |
identical() |
Exact equality | identical(1L, 1L) |
all.equal() |
Near-equality (floats) | all.equal(0.1+0.2, 0.3) |
x + 1, R doesn't loop in R, it calls a compiled routine that processes all elements at once, often 10-100× faster than an explicit for loop doing the same thing.Try it: Use seq_len() to build a vector of length 7, then reverse it with rev(). Save the reversed vector as ex_countdown.
Click to reveal solution
seq_len(7) builds 1:7; rev() flips the order in-place without a loop.
How do I manipulate data frames with dplyr?
dplyr gives you a small grammar, about a dozen verbs, that composes into almost any wrangling task you can describe. The verbs chain together with the pipe |>, so the code reads left-to-right the way you'd explain it out loud: "take mtcars, filter rows where mpg > 20, group by cylinder count, summarise mean horsepower."
Fourteen efficient cars made it through the filter. Four-cylinder cars average 82 hp; the three six-cylinder survivors average 110 hp. Notice how the pipe routes the data frame through each verb in sequence, filter() keeps rows, group_by() tags them for aggregation, summarise() collapses each group to one row.
Row operations
| Function | Description | Example |
|---|---|---|
filter() |
Keep rows matching condition | filter(mtcars, mpg > 20) |
slice() |
Keep rows by position | slice(mtcars, 1:5) |
slice_head() / slice_tail() |
First/last n rows | slice_head(mtcars, n = 3) |
slice_sample() |
Random rows | slice_sample(mtcars, n = 5) |
slice_min() / slice_max() |
Rows with min/max value | slice_max(mtcars, mpg, n = 3) |
arrange() |
Sort rows | arrange(mtcars, desc(mpg)) |
distinct() |
Unique rows | distinct(mtcars, cyl) |
Column operations
| Function | Description | Example |
|---|---|---|
select() |
Keep columns by name | select(mtcars, mpg, hp) |
rename() |
Rename columns | rename(mtcars, miles_per_gallon = mpg) |
mutate() |
Add/modify columns | mutate(mtcars, kpl = mpg * 0.425) |
transmute() |
Mutate + drop others | transmute(mtcars, kpl = mpg * 0.425) |
relocate() |
Reorder columns | relocate(mtcars, cyl, .before = mpg) |
pull() |
Extract single column as vector | pull(mtcars, mpg) |
across() |
Apply function to multiple columns | summarise(mtcars, across(mpg:hp, mean)) |
Grouping and aggregation
| Function | Description | Example |
|---|---|---|
group_by() |
Set grouping keys | group_by(mtcars, cyl) |
ungroup() |
Remove grouping | ungroup(df) |
summarise() |
Collapse to one row per group | summarise(df, mean(mpg)) |
count() |
Shortcut for group_by + n() | count(mtcars, cyl) |
tally() |
Fast row-count by group | tally(group_by(mtcars, cyl)) |
n() |
Size of current group | summarise(df, n = n()) |
n_distinct() |
Unique values | n_distinct(mtcars$cyl) |
first() / last() / nth() |
Positional pickers | first(mtcars$mpg) |
Joins
| Function | Description | Example |
|---|---|---|
inner_join() |
Keep only matching rows | inner_join(a, b, by = "id") |
left_join() |
Keep all left rows | left_join(a, b, by = "id") |
right_join() |
Keep all right rows | right_join(a, b, by = "id") |
full_join() |
Keep all rows from both | full_join(a, b, by = "id") |
semi_join() |
Left rows with a match | semi_join(a, b, by = "id") |
anti_join() |
Left rows without a match | anti_join(a, b, by = "id") |
bind_rows() / bind_cols() |
Stack/bind data frames | bind_rows(df1, df2) |
Reshaping (tidyr companion)
| Function | Description | Example |
|---|---|---|
pivot_longer() |
Wide → long | pivot_longer(df, cols = -id) |
pivot_wider() |
Long → wide | pivot_wider(df, names_from = key) |
separate() |
Split column by delimiter | separate(df, x, into = c("a","b")) |
unite() |
Join columns | unite(df, full, first, last) |
drop_na() |
Drop rows with NAs | drop_na(df) |
fill() |
Forward/back-fill missing | fill(df, x) |
|> over magrittr's %>% in new code. The native pipe ships with base R (4.1+), has zero package overhead, and is faster. Keep %>% only when you need the . placeholder or the assignment pipe %<>%.Try it: Use airquality to count the number of days where Temp > 80. Save the count as ex_hot_days. Hint: combine filter() with nrow() or count().
Click to reveal solution
filter() keeps only the rows where temperature exceeds 80; nrow() counts what's left. Equivalently: sum(airquality$Temp > 80).
How do I build plots with ggplot2?
ggplot2 treats a plot as layers you add together: a dataset, a mapping from variables to visual properties (aesthetics), one or more geometric shapes (geoms), and optional scales, facets, and themes. Once the grammar clicks, you can describe any chart as a short recipe.
Each + adds a layer. aes() maps weight to x, mpg to y, and cylinder count to color; geom_point() draws the dots; geom_smooth(method = "lm") fits a linear trend per color group; labs() sets the title and legend name; theme_minimal() strips the grey background. Change any argument and the whole plot updates, that's the payoff of the grammar.
Geoms (what you draw)
| Function | Description | Example |
|---|---|---|
geom_point() |
Scatter plot | geom_point(aes(x, y)) |
geom_line() |
Line chart | geom_line(aes(x, y)) |
geom_bar() |
Bar chart (counts) | geom_bar(aes(x)) |
geom_col() |
Bar chart (explicit y) | geom_col(aes(x, y)) |
geom_histogram() |
Histogram | geom_histogram(bins = 30) |
geom_density() |
Density curve | geom_density(aes(x)) |
geom_boxplot() |
Boxplot | geom_boxplot(aes(x, y)) |
geom_violin() |
Violin plot | geom_violin(aes(x, y)) |
geom_jitter() |
Jittered scatter | geom_jitter(width = 0.2) |
geom_smooth() |
Regression / LOESS | geom_smooth(method = "lm") |
geom_area() / geom_ribbon() |
Filled area | geom_area(aes(x, y)) |
geom_tile() |
Heatmap cells | geom_tile(aes(x, y, fill = z)) |
geom_text() / geom_label() |
Annotations | geom_text(aes(label = name)) |
geom_hline() / geom_vline() |
Reference lines | geom_hline(yintercept = 0) |
Scales, labels, and limits
| Function | Description | Example |
|---|---|---|
labs() |
Title, axis, legend text | labs(title = "My plot") |
xlab() / ylab() |
Axis titles | xlab("Weight") |
xlim() / ylim() |
Axis range | xlim(0, 10) |
scale_x_continuous() |
Numeric x-axis scale | scale_x_continuous(breaks = 1:10) |
scale_x_log10() |
Log x-axis | scale_x_log10() |
scale_color_manual() |
Custom discrete colors | scale_color_manual(values = c("red","blue")) |
scale_fill_brewer() |
ColorBrewer palette | scale_fill_brewer(palette = "Set1") |
scale_y_date() |
Date y-axis | scale_y_date() |
coord_flip() |
Flip x and y | coord_flip() |
coord_cartesian() |
Zoom without clipping | coord_cartesian(ylim = c(0, 40)) |
Facets, themes, and position
| Function | Description | Example |
|---|---|---|
facet_wrap() |
Wrap small multiples | facet_wrap(~ cyl) |
facet_grid() |
2D facet grid | facet_grid(gear ~ cyl) |
theme_minimal() / theme_bw() / theme_classic() |
Built-in themes | theme_minimal() |
theme() |
Customize any element | theme(legend.position = "top") |
element_text() / element_blank() |
Theme building blocks | theme(plot.title = element_text(face = "bold")) |
position_dodge() |
Side-by-side bars | position_dodge(width = 0.9) |
position_jitter() |
Jitter to reduce overplotting | position_jitter(width = 0.2) |
ggsave() |
Save plot to file | ggsave("out.png", p1) |
geom_*() or aes(). If you see "could not find function ggplot", your library(ggplot2) call is missing or hasn't run yet. On this page, the library persists across all blocks below once you run block 3.Try it: Modify the plot below so it uses theme_minimal() and has the title "MPG distribution".
Click to reveal solution
Each new behaviour is another + layer, labs() for the title, theme_minimal() for the clean background.
Which statistics functions do I use most?
R was built for statistics. The functions below have been refined since 1993 and form the backbone of every statistical workflow, from a quick descriptive summary to a full linear model with diagnostics. Most statistical distributions follow a consistent r/d/p/q naming convention: rnorm() draws random values, dnorm() is the density, pnorm() is the CDF, qnorm() is the quantile function. Learn the pattern once, apply it to binom, pois, unif, exp, chisq, t, f, gamma, and beta.
Automatic cars average 7.2 mpg more than manuals, with a p-value of 0.0014, unusually fuel-efficient, driven by the heavier automatics in the 1974 mtcars sample. The linear model says every extra 1000 lbs of weight costs 3.88 mpg and every extra 1 hp costs 0.032 mpg, holding the other constant. coef(), summary(), confint(), and predict() all operate on the same fitted model object.
Descriptive statistics
| Function | Description | Example |
|---|---|---|
mean() / median() |
Central tendency | mean(mtcars$mpg) |
sd() / var() |
Spread | sd(mtcars$mpg) |
min() / max() / range() |
Extremes | range(mtcars$mpg) |
quantile() |
Arbitrary percentiles | quantile(mtcars$mpg, 0.9) |
IQR() |
Interquartile range | IQR(mtcars$mpg) |
summary() |
5-number + mean summary | summary(mtcars) |
cor() / cov() |
Correlation / covariance | cor(mtcars$mpg, mtcars$wt) |
scale() |
Z-score standardize | scale(mtcars$mpg) |
Distributions (r/d/p/q pattern)
| Family | Example | Meaning |
|---|---|---|
rnorm(n, mean, sd) |
rnorm(100, 0, 1) |
Random draws |
dnorm(x, mean, sd) |
dnorm(0) |
Density at x |
pnorm(q, mean, sd) |
pnorm(1.96) |
CDF (≤ q) |
qnorm(p, mean, sd) |
qnorm(0.975) |
Quantile for p |
rbinom(), dbinom(), pbinom(), qbinom() |
Binomial | Coin flips, successes |
rpois(), dpois(), ppois(), qpois() |
Poisson | Count events |
runif(), dunif(), punif(), qunif() |
Uniform | Flat distribution |
rexp(), dexp() |
Exponential | Waiting times |
rt(), rchisq(), rf() |
t, chi-square, F | Test statistic dists |
Hypothesis tests
| Function | Description | Example |
|---|---|---|
t.test() |
One/two-sample t-test | t.test(x, y) |
wilcox.test() |
Rank-sum / signed-rank | wilcox.test(x, y) |
chisq.test() |
Chi-square test | chisq.test(table(x, y)) |
fisher.test() |
Fisher's exact test | fisher.test(matrix(c(1,2,3,4), 2)) |
cor.test() |
Correlation test | cor.test(x, y) |
shapiro.test() |
Normality test | shapiro.test(rnorm(50)) |
ks.test() |
Kolmogorov-Smirnov | ks.test(x, "pnorm") |
prop.test() |
Proportion test | prop.test(45, 100) |
binom.test() |
Exact binomial | binom.test(45, 100) |
Modeling
| Function | Description | Example |
|---|---|---|
lm() |
Linear regression | lm(mpg ~ wt + hp, mtcars) |
glm() |
Generalized linear model | glm(am ~ mpg, mtcars, family = binomial) |
aov() / anova() |
ANOVA | aov(mpg ~ factor(cyl), mtcars) |
predict() |
Model predictions | predict(fit, newdata) |
residuals() |
Model residuals | residuals(fit) |
coef() |
Coefficients | coef(fit) |
confint() |
Confidence intervals | confint(fit) |
summary(fit) |
Full model summary | summary(fit) |
AIC() / BIC() |
Model selection criteria | AIC(fit) |
step() |
Stepwise selection | step(fit) |
rnorm/dnorm/pnorm/qnorm, you know rbinom/dbinom/pbinom/qbinom and every other family. This is no accident, it's a deliberate API design choice that rewards fluency.Try it: Fit a linear model of hp explained by mpg on mtcars, save it as ex_fit, and print only the coefficients.
Click to reveal solution
Every 1 mpg increase predicts an 8.83 hp decrease, the fuel-economy / horsepower tradeoff in one coefficient.
How do I handle strings and dates in R?
Text and time are the two fiddliest parts of R for beginners, every locale, format, and edge case can bite. Base R covers the essentials; stringr and lubridate cover the rest with friendlier argument orders. The payoff block below parses three character strings into real Date objects, extracts the year, and computes the day of the week.
as.Date() parses ISO strings by default; format() with a strftime pattern pulls out any component; weekdays() returns the localized day name. The same three steps work for any number of dates, vectorized, no loop.
Strings, base R
| Function | Description | Example |
|---|---|---|
paste() / paste0() |
Concatenate with/without separator | paste("a","b", sep="-") |
sprintf() |
C-style formatting | sprintf("%.2f", 3.14159) |
nchar() |
Number of characters | nchar("hello") |
substr() / substring() |
Extract substring | substr("hello", 1, 3) |
toupper() / tolower() |
Change case | toupper("hi") |
trimws() |
Strip whitespace | trimws(" hi ") |
grep() / grepl() |
Find pattern (indices / logical) | grepl("a", c("cat","dog")) |
sub() / gsub() |
Replace first / all matches | gsub("o", "0", "foo") |
regmatches() |
Extract matched text | regmatches("a1b2", regexpr("[0-9]+", "a1b2")) |
strsplit() |
Split by delimiter | strsplit("a,b,c", ",") |
startsWith() / endsWith() |
Prefix / suffix test | startsWith("hello", "he") |
format() / formatC() |
Format numbers as strings | format(3.14, nsmall = 4) |
Strings, stringr
| Function | Description | Example |
|---|---|---|
str_detect() |
Contains pattern? | str_detect(x, "a") |
str_replace() / str_replace_all() |
Replace pattern | str_replace_all(x, "a", "A") |
str_extract() |
Extract first match | str_extract(x, "[0-9]+") |
str_match() |
Extract capture groups | str_match(x, "(\\d+)") |
str_split() |
Split string | str_split(x, ",") |
str_length() |
Character count | str_length(x) |
str_sub() |
Substring | str_sub(x, 1, 3) |
str_trim() / str_squish() |
Strip whitespace | str_squish(" hi you ") |
str_to_lower() / str_to_upper() / str_to_title() |
Change case | str_to_title("hello world") |
str_pad() |
Pad to width | str_pad("42", 5, pad = "0") |
Dates, base R
| Function | Description | Example |
|---|---|---|
Sys.Date() / Sys.time() |
Today / now | Sys.Date() |
as.Date() |
Parse to Date | as.Date("2026-03-29") |
as.POSIXct() / as.POSIXlt() |
Parse to datetime | as.POSIXct("2026-03-29 10:30") |
format() |
Format date to string | format(Sys.Date(), "%Y") |
strptime() |
Parse with format | strptime("29/03/26","%d/%m/%y") |
difftime() |
Time difference | difftime(Sys.Date(), as.Date("2026-01-01")) |
seq.Date() |
Sequence of dates | seq.Date(as.Date("2026-01-01"), by="month", length.out=6) |
weekdays() / months() / quarters() |
Components | weekdays(Sys.Date()) |
Dates, lubridate
| Function | Description | Example |
|---|---|---|
ymd() / mdy() / dmy() |
Flexible parsers | ymd("2026-03-29") |
ymd_hms() |
Datetime parser | ymd_hms("2026-03-29 10:30:00") |
year() / month() / day() |
Extract components | year(Sys.Date()) |
wday() / yday() |
Weekday / year-day | wday(Sys.Date()) |
hour() / minute() / second() |
Time parts | hour(Sys.time()) |
today() / now() |
Current date/time | today() |
days() / weeks() / months() |
Time periods | today() + days(7) |
interval() |
Time interval | interval(start, end) |
floor_date() / ceiling_date() |
Round to unit | floor_date(Sys.time(), "hour") |
"2026-03-29 02:30:00" parsed on a machine in London may convert to a different instant than the same string parsed in New York. For reproducible work, pass tz = "UTC" explicitly to as.POSIXct() or use lubridate::with_tz().Try it: Given the vector below, extract just the 4-digit year from each string and save as ex_years. The result should be a character vector.
Click to reveal solution
Since the format is fixed-width, substr() is the simplest answer. For variable formats, parse first: format(as.Date(raw), "%Y").
What about I/O, control flow, and functional programming?
The last category covers the glue that holds scripts together: reading and writing files, branching and looping, applying functions over collections, and catching errors. R has two layers here, base R's apply() family and the more consistent purrr::map() family from the tidyverse, and you'll see both in the wild.
sapply() walks the first four columns of mtcars, applies mean() to each, and returns a named numeric vector. tryCatch() wraps log() so that log(-1), which normally warns and returns NaN, becomes a clean NA instead. Same two-line recipe handles any "try this, default on failure" pattern in R.
Input / Output
| Function | Description | Example |
|---|---|---|
read.csv() |
Read CSV (base) | read.csv("data.csv") |
read.table() |
Read delimited file | read.table("x.txt", header = TRUE) |
readr::read_csv() |
Fast CSV (tidyverse) | read_csv("data.csv") |
readLines() |
Read lines of text | readLines("file.txt") |
readRDS() / saveRDS() |
Read/write single R object | saveRDS(model, "m.rds") |
load() / save() |
Multi-object .RData | save(x, y, file="d.RData") |
write.csv() / write.table() |
Write delimited files | write.csv(df, "out.csv") |
writeLines() |
Write character vector | writeLines(c("a","b"), "out.txt") |
cat() / print() / message() |
Console output | cat("Hello\n") |
file.exists() / dir() / getwd() / setwd() |
Filesystem basics | file.exists("data.csv") |
download.file() |
Download from URL | download.file(url, "x.csv") |
Control flow
| Construct | Description | Example |
|---|---|---|
if (...) {} else {} |
Branching | if (x > 0) "pos" else "neg" |
ifelse(cond, yes, no) |
Vectorized if-else | ifelse(1:5 > 3, "hi", "lo") |
for (i in x) {} |
For loop | for (i in 1:3) print(i) |
while (cond) {} |
While loop | while (x < 10) x <- x + 1 |
repeat { break } |
Loop until break | repeat { if (done) break } |
next / break |
Skip / exit iteration | for (i in 1:5) if (i==3) next else print(i) |
switch() |
Multi-branch dispatch | switch("a", a=1, b=2) |
stopifnot() |
Assert conditions | stopifnot(x > 0) |
stop() / warning() / message() |
Signal conditions | stop("Invalid!") |
Apply family and purrr
| Function | Description | Example |
|---|---|---|
apply() |
Over matrix rows/cols | apply(m, 2, sum) |
lapply() |
Over list, return list | lapply(1:3, sqrt) |
sapply() |
Over list, simplify | sapply(1:3, sqrt) |
vapply() |
Type-safe apply | vapply(1:3, sqrt, numeric(1)) |
mapply() |
Multi-arg apply | mapply("+", 1:3, 4:6) |
tapply() |
Apply by group | tapply(mtcars$mpg, mtcars$cyl, mean) |
Map() / Reduce() / Filter() / Find() / Position() |
Functional helpers | Reduce("+", 1:5) |
purrr::map() |
List-in, list-out | map(1:3, sqrt) |
map_dbl() / map_chr() / map_int() / map_lgl() |
Typed map | map_dbl(1:3, sqrt) |
map_df() |
Row-bind results | map_df(1:3, ~ data.frame(x = .x)) |
walk() |
Side-effect only | walk(files, print) |
keep() / discard() |
Filter list | keep(1:10, ~ .x > 5) |
reduce() |
Accumulate from left | reduce(1:5, "+") |
Error handling
| Function | Description | Example |
|---|---|---|
tryCatch() |
Typed condition handling | tryCatch(log(-1), warning = \(w) NA) |
try() |
Swallow errors | try(log("a"), silent = TRUE) |
withCallingHandlers() |
Run handler, resume | withCallingHandlers(f(), warning = log_it) |
conditionMessage() |
Extract message | conditionMessage(e) |
simpleError() / simpleWarning() |
Construct conditions | simpleError("oops") |
purrr::map_*() family instead of sapply() when you care about the return type. map_dbl() errors loudly if any element isn't a double; sapply() silently returns a list instead. That type-strictness catches bugs the moment they happen instead of three functions later.Try it: Use sapply() on the iris data frame to return the class of each column. Save the result as ex_classes.
Click to reveal solution
sapply() applies class() to each column of iris and simplifies the result to a named character vector.
Practice Exercises
Two capstone exercises combining multiple sections of the cheat sheet. Work them yourself first, then open the solution to check your approach.
Exercise 1: dplyr pipeline, filter, group, summarise, arrange
Using mtcars, write one pipeline that: keeps cars with mpg > 20, groups by cyl, computes the mean hp per group as mean_hp, and arranges the result from highest mean_hp to lowest. Save the result as my_result.
Click to reveal solution
Four dplyr verbs, one pipeline. arrange(desc(...)) sorts descending; the .groups = "drop" argument silences the grouping message after summarise().
Exercise 2: ggplot2, scatter + smooth + theme in one plot
Build a plot from mtcars showing mpg versus wt, colored by factor(cyl), with a linear trend line per cylinder group (no confidence ribbon), a title "MPG by weight and cylinders", and a minimal theme. Save the plot object as my_plot.
Click to reveal solution
Five layers, five concepts: aesthetic mapping, points, linear smoother per color group, labels, and theme. Swap method = "lm" for "loess" to see a curved local fit instead.
Complete Example
To tie the cheat sheet together, here's a 20-line analysis of the built-in airquality dataset: drop missing ozone readings, compute the monthly mean, and plot it as a bar chart. Functions from dplyr, ggplot2, and base R all appear in one pipeline.
That single script demonstrates nine of the functions from this cheat sheet: drop_na(), mutate(), factor(), group_by(), summarise(), arrange(), ggplot(), geom_col(), theme_minimal(). July and August average around 60 ppb ozone, typical summer smog levels, while the rest of the months sit 30 ppb or lower.
Summary
The fastest way to use this cheat sheet is to stop memorizing and start looking up. Bookmark the page, skim the section headings, and when you hit a task you can't solve, search this page with Ctrl+F.
| Task | Go-to function(s) | |
|---|---|---|
| Create a sequence | seq(), seq_len(), 1:n |
|
| Pick rows by condition | filter() (dplyr), subset() (base) |
|
| Pick columns | select() (dplyr), [ , cols] (base) |
|
| Add / modify columns | mutate(), transform() |
|
| Group and summarise | `group_by() | > summarise(), tapply(), aggregate()` |
| Sort rows | arrange(), order() |
|
| Join two tables | left_join(), inner_join(), merge() |
|
| Reshape wide ↔ long | pivot_longer(), pivot_wider() |
|
| Build a plot | ggplot() + aes() + geom_*() |
|
| Fit a linear model | lm(), glm() |
|
| Hypothesis test | t.test(), wilcox.test(), chisq.test() |
|
| Parse a date | as.Date(), lubridate::ymd() |
|
| Extract a substring | substr(), stringr::str_sub() |
|
| Apply a function over a collection | sapply(), purrr::map_dbl() |
|
| Read a CSV | read.csv(), readr::read_csv() |
|
| Handle an error safely | tryCatch() |
Two hundred functions, six categories, one page. Come back often.
References
- R Core Team, An Introduction to R official manual. Link
- Wickham, H. & Grolemund, G., R for Data Science (2e). Link
- dplyr function reference, tidyverse. Link
- ggplot2 function reference, tidyverse. Link
- stringr function reference, tidyverse. Link
- lubridate function reference, tidyverse. Link
- purrr function reference, tidyverse. Link
- Posit cheatsheets collection. Link
- R Language Definition, official reference. Link
Continue Learning
- Getting Help in R, How to search R's help system when a function isn't on this page
- R for Excel Users, Map your Excel formulas and pivots to R equivalents