R Vectors: The Foundation of Everything in R (Master This First)
A vector is R's most fundamental data structure — an ordered collection of values of the same type. In R, even a single number is a vector of length 1. Master vectors first, and everything else in R clicks into place.
Almost every R operation works on vectors. When you compute mean(x), x is a vector. When you filter a data frame column, you're working with a vector. When you plot data, you're passing vectors to the plotting function. Vectors are everywhere.
This tutorial covers everything you need: creating vectors, accessing elements, modifying them, filtering, vectorized operations (R's secret superpower), and named vectors.
Introduction
A vector is an ordered sequence of values where every element has the same type — all numeric, all character, or all logical. This single-type constraint is what makes vectors fast and predictable.
Think of a vector as a row of mailboxes: each has a number (its position), and they all hold the same kind of mail (the type). You can look at one mailbox, a range of mailboxes, or all of them at once.
Notice the difference between times and each in rep(): times repeats the whole vector; each repeats each element before moving to the next.
Combining vectors
You can combine existing vectors into larger ones using c():
first_half <- c(1, 2, 3)
second_half <- c(4, 5, 6)
# Combine into one vector
combined <- c(first_half, second_half)
cat("Combined:", combined, "\n")
# Add single elements
extended <- c(0, first_half, 3.5, second_half, 7)
cat("Extended:", extended, "\n")
Accessing Elements (Subsetting)
R uses square brackets[] to access vector elements. R uses 1-based indexing — the first element is at position 1, not 0.
By position
fruits <- c("apple", "banana", "cherry", "date", "elderberry")
# Single element
cat("1st:", fruits[1], "\n")
cat("3rd:", fruits[3], "\n")
cat("Last:", fruits[length(fruits)], "\n")
# Multiple elements
cat("1st and 3rd:", fruits[c(1, 3)], "\n")
cat("First three:", fruits[1:3], "\n")
# Negative indexing — exclude elements
cat("All except 2nd:", fruits[-2], "\n")
cat("Except 1st and 5th:", fruits[c(-1, -5)], "\n")
Key point: Negative indices exclude elements. fruits[-2] means "everything except the 2nd element." You cannot mix positive and negative indices in the same bracket.
By logical vector
This is the most powerful form of subsetting — and the foundation of data filtering in R:
scores <- c(88, 72, 95, 61, 83, 77, 90)
# Create a logical condition
above_80 <- scores > 80
cat("Above 80?", above_80, "\n")
# Use the logical vector to filter
cat("Scores above 80:", scores[above_80], "\n")
# Do it in one step (most common pattern)
cat("Scores above 80:", scores[scores > 80], "\n")
cat("Scores 70-85:", scores[scores >= 70 & scores <= 85], "\n")
# Count how many pass
cat("Count above 80:", sum(scores > 80), "\n")
cat("Percentage above 80:", mean(scores > 80) * 100, "%\n")
The pattern vector[condition] is one of the most important patterns in R. You'll use it constantly in data analysis.
By name
Vectors can have named elements, which you can use for access:
# Named vector
ages <- c(Alice = 25, Bob = 32, Carol = 28, David = 45)
cat("Ages:", ages, "\n")
# Access by name
cat("Alice's age:", ages["Alice"], "\n")
cat("Bob and Carol:", ages[c("Bob", "Carol")], "\n")
# See all names
cat("Names:", names(ages), "\n")
# Add names to an existing vector
scores <- c(88, 72, 95)
names(scores) <- c("Math", "English", "Science")
cat("Scores:", scores, "\n")
cat("Science:", scores["Science"], "\n")
Named vectors are like simple lookup tables. They're especially useful when you need to map codes to labels or store configuration values.
The conditional replacement scores[scores < 50] <- 50 is a powerful one-liner. It says: "find all elements below 50, and set them to 50."
Add and remove elements
x <- c(10, 20, 30)
# Append to the end
x <- c(x, 40, 50)
cat("After append:", x, "\n")
# Prepend to the beginning
x <- c(0, x)
cat("After prepend:", x, "\n")
# Insert in the middle (at position 3)
x <- c(x[1:2], 15, x[3:length(x)])
cat("After insert:", x, "\n")
# Remove by position
x <- x[-3] # Remove the element at position 3
cat("After remove:", x, "\n")
# Remove by value (keep everything that's not 30)
x <- x[x != 30]
cat("After removing 30:", x, "\n")
Note: R vectors don't have a built-in "insert at position" or "delete" function like Python lists do. You create a new vector by combining the pieces you want. This isn't a problem in practice — R is designed for working with whole vectors, not individual element operations.
Vectorized Operations: R's Superpower
Vectorization is the #1 reason R is fast at data analysis. When you apply an operation to a vector, R applies it to every element simultaneously — no loops needed.
In Python, you'd need a for loop or list comprehension for this. In R, it's one line. This makes R code more concise, more readable, and significantly faster (R executes vectorized operations in optimized C code under the hood).
Element-wise operations on two vectors
When you combine two vectors with an operator, R matches them element by element:
Recycling: what happens when vectors are different lengths
If two vectors have different lengths, R recycles the shorter one — repeating it until it matches the longer one:
# Same length — straightforward
x <- c(1, 2, 3)
y <- c(10, 20, 30)
cat("Same length:", x + y, "\n")
# Scalar + vector — scalar gets recycled
cat("x + 100:", x + 100, "\n") # 100 is recycled: c(100, 100, 100)
# Different lengths — shorter recycled
a <- c(1, 2, 3, 4, 5, 6)
b <- c(10, 20)
cat("Recycled:", a + b, "\n") # b becomes c(10, 20, 10, 20, 10, 20)
# Warning when lengths don't divide evenly
a2 <- c(1, 2, 3, 4, 5)
b2 <- c(10, 20)
cat("Uneven recycle:", a2 + b2, "\n") # Works but warns
Scalar recycling (adding a single number to a vector) is useful and common. Multi-element recycling can be a source of bugs — R will warn you when the lengths don't divide evenly.
Useful Vector Functions
Here are the functions you'll use most often with vectors:
order() is more useful than sort() in practice because it tells you where each sorted element came from — essential for sorting data frames (you sort by one column and need the other columns to follow along).
Set operations
a <- c(1, 2, 3, 4, 5)
b <- c(3, 4, 5, 6, 7)
cat("Union:", union(a, b), "\n") # All unique values from both
cat("Intersect:", intersect(a, b), "\n") # Values in both
cat("Setdiff(a, b):", setdiff(a, b), "\n") # In a but not b
cat("Setdiff(b, a):", setdiff(b, a), "\n") # In b but not a
cat("Is 3 in a?", 3 %in% a, "\n") # Membership test
cat("Which of b are in a?", b %in% a, "\n")
The %in% operator is extremely useful for filtering: "show me all rows where the category is in this list."
Handling Missing Values (NA)
Real data almost always has missing values. R represents these as NA. Here's how to work with vectors that contain NAs:
# Vector with missing values
temps <- c(72, NA, 75, 80, NA, 65, 71)
cat("Temperatures:", temps, "\n")
# Most functions return NA if any value is NA
cat("Mean with NA:", mean(temps), "\n")
# Use na.rm = TRUE to ignore NAs
cat("Mean without NA:", mean(temps, na.rm = TRUE), "\n")
cat("Sum without NA:", sum(temps, na.rm = TRUE), "\n")
# Find and count NAs
cat("Is NA:", is.na(temps), "\n")
cat("NA count:", sum(is.na(temps)), "\n")
cat("Non-NA count:", sum(!is.na(temps)), "\n")
# Remove NAs from a vector
clean_temps <- temps[!is.na(temps)]
cat("Clean:", clean_temps, "\n")
# Or use na.omit()
cat("na.omit:", na.omit(temps), "\n")
The na.rm = TRUE pattern appears in almost every R script that handles real data. Memorize it.
Practice Exercises
Exercise 1: Vector Basics
Create a vector of daily step counts and analyze it:
# Exercise: A fitness tracker recorded daily steps for two weeks
# Week 1: 8200, 10500, 7800, 12000, 9500, 15000, 6000
# Week 2: 9000, 11200, 8500, 10000, 13500, 14000, 7500
#
# 1. Combine both weeks into one vector
# 2. Find the total steps, daily average, best and worst day
# 3. How many days exceeded 10,000 steps?
# 4. What percentage of days exceeded 10,000 steps?
# Write your code below:
Explanation:sum(all_steps > 10000) counts how many TRUE values there are in the logical vector. mean(all_steps > 10000) gives the proportion, which we multiply by 100 for the percentage.
Exercise 2: Filtering and Replacing
Clean a vector of test scores:
# Exercise: A teacher has test scores with some data entry errors
# scores <- c(88, 102, 75, -5, 91, 200, 83, 67, 95, -10)
#
# 1. Scores must be between 0 and 100
# 2. Find and report which positions have invalid scores
# 3. Replace invalid scores with NA
# 4. Calculate the mean of valid scores only
# Write your code below:
Explanation: The | (OR) operator combines two conditions. which() returns the positions where the condition is TRUE. Setting those positions to NA and using na.rm = TRUE cleanly handles the invalid data.
Exercise 3: Named Vectors as Lookup Tables
Create a grading system using named vectors:
# Exercise: Create a named vector that maps letter grades to GPA values
# A = 4.0, B = 3.0, C = 2.0, D = 1.0, F = 0.0
#
# A student's grades: c("A", "B", "A", "C", "B", "A")
# 1. Use the lookup vector to convert letters to GPA values
# 2. Calculate the student's GPA (mean of all grades)
# Hint: You can use a named vector as a lookup: lookup[keys]
# Write your code below:
Click to reveal solution
# Solution
# Create the lookup table
gpa_lookup <- c(A = 4.0, B = 3.0, C = 2.0, D = 1.0, F = 0.0)
# Student's grades
grades <- c("A", "B", "A", "C", "B", "A")
# Convert using the lookup vector
gpa_values <- gpa_lookup[grades]
cat("Grades:", grades, "\n")
cat("GPA values:", gpa_values, "\n")
cat("GPA:", round(mean(gpa_values), 2), "\n")
Explanation:gpa_lookup[grades] uses each element of grades as a name to look up the corresponding value in gpa_lookup. This is one of the most elegant patterns in R — using named vectors as dictionaries/hash maps.
Summary
Operation
Code
Example
Create
c()
c(1, 2, 3)
Sequence
:, seq()
1:10, seq(0, 1, by = 0.1)
Repeat
rep()
rep("x", 3)
Access by position
[n]
x[3]
Access by condition
[condition]
x[x > 5]
Access by name
["name"]
ages["Alice"]
Exclude
[-n]
x[-1]
Modify
[n] <- value
x[2] <- 99
Length
length()
length(x)
Sort
sort()
sort(x, decreasing = TRUE)
Find position
which()
which(x > 5)
Membership
%in%
3 %in% x
Missing values
na.rm = TRUE
mean(x, na.rm = TRUE)
R's superpower: Vectorized operations apply to all elements at once — no loops needed. x * 2 doubles every element. x[x > 0] filters in one step.
FAQ
What's the difference between a vector and a list?
A vector requires all elements to have the same type (all numeric, all character, etc.). A list can hold elements of different types — even other vectors, data frames, or models. Vectors are simpler and faster; lists are more flexible.
Can a vector have zero elements?
Yes. c() with no arguments creates an empty numeric vector. You can also create typed empty vectors: character(0), numeric(0), logical(0). Empty vectors are useful when building up results in a loop.
Why is R 1-indexed instead of 0-indexed?
R was designed by statisticians, and statisticians count starting at 1 (the first observation, the first row). Most other languages (Python, Java, C) use 0-based indexing. This is a common source of confusion when switching between R and Python.
How do I add an element to a specific position?
R doesn't have an insert() function. You rebuild the vector: c(x[1:2], new_value, x[3:length(x)]). This seems awkward, but in practice you rarely insert single elements — R is designed for whole-vector operations.
What's the maximum size of a vector?
Theoretically, up to 2^31 - 1 elements (~2.1 billion) on standard R, or 2^52 on 64-bit R with long vectors enabled. In practice, you're limited by RAM. A vector of 100 million doubles uses about 800 MB of memory.
What's Next?
Now that you've mastered vectors, you're ready for more complex data structures:
R Data Frames — tabular data with rows and columns, where each column is a vector
R Lists — the flexible container that holds any type of object
R Control Flow — if/else, for loops, and while loops
Each tutorial builds on your vector knowledge — data frames are collections of vectors, and most loop operations can be replaced by vectorized operations.