Write Better R Functions: Arguments, Defaults, Scope & When to Vectorise
A function in R packages a block of code so you can reuse it. You define it once with function(), give it a name, and call it whenever you need it — with different inputs each time.
Once your R scripts grow beyond 50 lines, you'll start copying and pasting code blocks. That's a sign you need functions. Functions make your code shorter, easier to test, and much easier to change — fix a bug in the function and every call benefits.
Introduction
A function takes inputs (arguments), does something with them, and returns a result. You've been using functions since your first R script — mean(), sum(), cat() are all functions. Now you'll learn to write your own.
Here's the anatomy of an R function:
# Define a function
greet <- function(name) {
message <- paste("Hello,", name, "! Welcome to R.")
return(message)
}
# Call it
cat(greet("Alice"), "\n")
cat(greet("Bob"), "\n")
greet is the name. name is the argument. return(message) is the output. Everything between {} is the body.
Your First Function
Let's start with a simple function and build from there:
# A function that converts Fahrenheit to Celsius
f_to_c <- function(temp_f) {
temp_c <- (temp_f - 32) * 5/9
return(round(temp_c, 1))
}
# Use it
cat("72°F =", f_to_c(72), "°C\n")
cat("32°F =", f_to_c(32), "°C\n")
cat("212°F =", f_to_c(212), "°C\n")
# It works on vectors too!
temps <- c(32, 50, 72, 100, 212)
cat("Batch conversion:", f_to_c(temps), "\n")
Because R math is vectorized, f_to_c() automatically works on single values AND vectors. You don't need to add a loop.
Arguments without defaults are required — omitting them causes an error. Arguments with defaults are optional — the default is used if you don't provide a value.
Positional vs named arguments
# These all call the same function the same way
calculate_total <- function(price, quantity, tax_rate = 0.08) {
return(round(price * quantity * (1 + tax_rate), 2))
}
# Positional (order matters)
cat("Positional:", calculate_total(25, 3, 0.10), "\n")
# Named (order doesn't matter)
cat("Named:", calculate_total(tax_rate = 0.10, quantity = 3, price = 25), "\n")
# Mixed (positional first, then named)
cat("Mixed:", calculate_total(25, 3, tax_rate = 0.10), "\n")
Best practice: Use positional arguments for the first 1-2 obvious arguments, then named arguments for everything else. mean(x) is clear; substring(x, 3, 7) is unclear — better as substring(x, first = 3, last = 7).
The ... (dot-dot-dot) argument
... passes extra arguments to other functions. It's extremely common in R:
# A wrapper function that passes ... to paste()
shout <- function(..., sep = " ") {
text <- paste(..., sep = sep)
return(toupper(text))
}
cat(shout("hello", "world"), "\n")
cat(shout("r", "is", "great", sep = "-"), "\n")
You'll see ... in functions like cat(), paste(), c(), and most plotting functions. It makes functions flexible without needing to list every possible argument.
Return Values
Explicit return
# Explicit return — recommended for clarity
divide_safe <- function(a, b) {
if (b == 0) {
return(NA) # Early return for edge case
}
return(a / b)
}
cat("10 / 3 =", divide_safe(10, 3), "\n")
cat("10 / 0 =", divide_safe(10, 0), "\n")
Implicit return
R returns the last evaluated expression automatically. Many R programmers omit return():
# Implicit return — the last expression is returned
add <- function(x, y) {
x + y # This value is returned
}
cat("3 + 4 =", add(3, 4), "\n")
Both styles are valid. Use explicit return() when you have early exits or the function is long. Use implicit return for short, simple functions.
Returning multiple values
R functions can only return one object — but that object can be a list:
# Return multiple values as a named list
describe <- function(x) {
list(
mean = round(mean(x), 2),
sd = round(sd(x), 2),
min = min(x),
max = max(x),
n = length(x)
)
}
data <- c(23, 45, 12, 67, 34, 89, 56)
result <- describe(data)
cat("Mean:", result$mean, "\n")
cat("SD:", result$sd, "\n")
cat("Range:", result$min, "to", result$max, "\n")
Scope: Where Variables Live
Scope determines where a variable is visible. Functions create their own scope — variables inside a function don't leak out:
x <- 100 # Global variable
my_func <- function() {
x <- 999 # Local variable — different from the global x
y <- 42 # Also local
cat("Inside function: x =", x, ", y =", y, "\n")
}
my_func()
cat("Outside function: x =", x, "\n")
# cat("y =", y) # Would error — y doesn't exist outside the function
The function has its own x (999) that doesn't affect the global x (100). And y only exists inside the function.
Lexical scoping: looking up the chain
If a variable isn't found inside the function, R looks in the parent environment (where the function was defined):
tax_rate <- 0.08 # Global variable
calculate_tax <- function(price) {
# tax_rate not defined here — R looks in the parent (global) environment
return(price * tax_rate)
}
cat("Tax on $100:", calculate_tax(100), "\n")
# Change the global variable
tax_rate <- 0.10
cat("Tax on $100 (new rate):", calculate_tax(100), "\n")
This works, but it's fragile — the function depends on a global variable. Better to pass tax_rate as an argument:
Best practice: Functions should get all their data from arguments, not global variables. This makes them predictable, testable, and reusable.
Error Handling: stop(), warning(), message()
Good functions validate their inputs and give clear error messages:
# A function with input validation
bmi <- function(weight_kg, height_m) {
# Validate inputs
if (!is.numeric(weight_kg) || !is.numeric(height_m)) {
stop("Both weight and height must be numeric")
}
if (weight_kg <= 0 || height_m <= 0) {
stop("Weight and height must be positive")
}
if (height_m > 3) {
warning("Height > 3m is unusual. Did you pass height in cm instead of m?")
}
result <- weight_kg / height_m^2
return(round(result, 1))
}
# Normal use
cat("BMI:", bmi(70, 1.75), "\n")
# Suspicious input (triggers warning)
cat("BMI:", bmi(70, 175), "\n") # Probably cm, not meters
Function
Behavior
Use when
stop("msg")
Stops execution, throws error
Input is invalid, can't continue
warning("msg")
Continues but shows warning
Something suspicious but not fatal
message("msg")
Shows info message
Progress updates, FYI messages
Common Function Patterns
Pattern 1: Data summarizer
# Summarize any numeric vector
quick_stats <- function(x, digits = 2) {
x <- x[!is.na(x)] # Remove NAs
data.frame(
n = length(x),
mean = round(mean(x), digits),
median = round(median(x), digits),
sd = round(sd(x), digits),
min = round(min(x), digits),
max = round(max(x), digits)
)
}
quick_stats(mtcars$mpg)
library(dplyr)
# A function that works in a pipe chain
add_grade_column <- function(df, score_col = "score") {
df |>
mutate(
grade = case_when(
.data[[score_col]] >= 90 ~ "A",
.data[[score_col]] >= 80 ~ "B",
.data[[score_col]] >= 70 ~ "C",
.data[[score_col]] >= 60 ~ "D",
TRUE ~ "F"
)
)
}
# Use in a pipe
data.frame(student = c("Alice", "Bob", "Carol"), score = c(92, 78, 85)) |>
add_grade_column() |>
print()
When to Vectorize (and When Not To)
R functions are automatically vectorized if the operations inside them are vectorized:
# This function is automatically vectorized
celsius_to_fahr <- function(c) {
c * 9/5 + 32
}
# Works on single values AND vectors — no changes needed
cat("Single:", celsius_to_fahr(100), "\n")
cat("Vector:", celsius_to_fahr(c(0, 20, 37, 100)), "\n")
If your function uses if/else (not ifelse()), it won't work on vectors:
# NOT vectorized — uses if/else
grade_single <- function(score) {
if (score >= 90) return("A")
if (score >= 80) return("B")
if (score >= 70) return("C")
return("F")
}
# Works for one value
cat("Single:", grade_single(85), "\n")
# Fails for a vector — uncomment to see:
# grade_single(c(85, 92, 68)) # Warning: only first element used
# Fix: use Vectorize() to make it work on vectors
grade <- Vectorize(grade_single)
cat("Vectorized:", grade(c(85, 92, 68)), "\n")
Or better yet, write it with ifelse() or case_when() from the start.
Practice Exercises
Exercise 1: Temperature Converter
# Exercise: Write a function temp_convert() that:
# - Takes a temperature value and a "from" unit ("C", "F", or "K")
# - Converts to all three units
# - Returns a named vector with C, F, K values
# Test: temp_convert(100, "C") should give C=100, F=212, K=373.15
# Write your code below:
Click to reveal solution
# Solution
temp_convert <- function(temp, from = "C") {
if (from == "C") {
c_val <- temp
} else if (from == "F") {
c_val <- (temp - 32) * 5/9
} else if (from == "K") {
c_val <- temp - 273.15
} else {
stop("'from' must be 'C', 'F', or 'K'")
}
c(C = round(c_val, 2),
F = round(c_val * 9/5 + 32, 2),
K = round(c_val + 273.15, 2))
}
cat("100°C =", temp_convert(100, "C"), "\n")
cat("32°F =", temp_convert(32, "F"), "\n")
cat("0K =", temp_convert(0, "K"), "\n")
Explanation: The function first converts any input to Celsius (the base unit), then computes all three outputs from Celsius. Using stop() for invalid input gives a clear error message.
Exercise 2: Statistical Outlier Detector
# Exercise: Write a function find_outliers() that:
# - Takes a numeric vector
# - Identifies outliers using the IQR method (< Q1-1.5*IQR or > Q3+1.5*IQR)
# - Returns a list with: outlier_values, outlier_positions, bounds (lower, upper)
# Test with: c(1, 2, 3, 4, 5, 100, -50, 3, 4, 5)
# Write your code below:
Explanation: The IQR method defines outliers as values more than 1.5 x IQR below Q1 or above Q3. The function returns a list so the caller gets the outlier values, their positions, and the bounds — all in one call.
Exercise 3: Flexible Summary Function
# Exercise: Write a function column_report() that:
# - Takes a data frame
# - For numeric columns: prints mean, sd, % missing
# - For character columns: prints unique count, most common value, % missing
# - Returns invisible(NULL) (it's a printing function)
# Test with: data.frame(x = c(1,2,NA,4), y = c("a","b","a","a"))
# Write your code below:
Explanation: The function loops over column names, checks each column's type with is.numeric(), and prints the appropriate summary. invisible(NULL) means it's called for its side effect (printing), not its return value.
Summary
Concept
Syntax
Example
Define
function(args) { body }
add <- function(x, y) x + y
Default arg
arg = default
f <- function(x, n = 10)
Return
return(value)
return(result)
Multiple returns
list(a = x, b = y)
Return a named list
Early exit
return() inside if
Guard clause pattern
Validation
stop("msg")
if (!is.numeric(x)) stop(...)
Vectorize
Vectorize(f)
Or write with ifelse()/case_when()
Pipe-friendly
First arg = data
my_func <- function(df, ...)
FAQ
Should I use return() explicitly or rely on implicit return?
Both are acceptable. Use explicit return() for functions longer than ~5 lines, functions with early exits (if (error) return(NA)), or when returning in the middle of the function. Use implicit return for short, one-expression functions.
How many arguments should a function have?
As few as possible. Functions with 1-3 arguments are easy to understand. If you need more than 5, consider grouping related arguments into a list or creating multiple smaller functions.
Can functions modify their arguments?
No. R uses copy-on-modify semantics. When you pass a variable to a function and modify it inside, R creates a copy — the original is unchanged. This is a feature, not a bug — it prevents unexpected side effects.
What does invisible() do?
invisible(x) returns x but doesn't print it. Use it when your function is called for a side effect (printing, plotting, writing files) and you don't want the return value cluttering the console.
When should I write a function vs use existing ones?
Write a function when you're copying and pasting the same code block more than twice. Before writing your own, search CRAN — there's probably a package that does what you need. The tidyverse, in particular, has functions for most common data manipulation tasks.
What's Next?
You can now write reusable R functions. Next:
R Special Values — handle NA, NULL, NaN, and Inf in your functions
Getting Help in R — find and understand R documentation
Functional Programming — use functions as arguments to other functions
Functions are the building blocks of all serious R programming.