R Vectors: The Foundation of Everything in R (Master This First)

A vector is R's most fundamental data structure — an ordered collection of values of the same type. In R, even a single number is a vector of length 1. Master vectors first, and everything else in R clicks into place.

Almost every R operation works on vectors. When you compute mean(x), x is a vector. When you filter a data frame column, you're working with a vector. When you plot data, you're passing vectors to the plotting function. Vectors are everywhere.

This tutorial covers everything you need: creating vectors, accessing elements, modifying them, filtering, vectorized operations (R's secret superpower), and named vectors.

Introduction

A vector is an ordered sequence of values where every element has the same type — all numeric, all character, or all logical. This single-type constraint is what makes vectors fast and predictable.

Think of a vector as a row of mailboxes: each has a number (its position), and they all hold the same kind of mail (the type). You can look at one mailbox, a range of mailboxes, or all of them at once.

# A vector in action temperatures <- c(72, 68, 75, 80, 77, 65, 71) days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun") cat("Temperatures:", temperatures, "\n") cat("Days:", days, "\n") cat("Length:", length(temperatures), "\n") cat("Type:", class(temperatures), "\n")

Creating Vectors

The c() function

The primary way to create a vector is c() (combine):

# Numeric vector prices <- c(9.99, 14.50, 3.25, 22.00) cat("Prices:", prices, "\n") # Character vector colors <- c("red", "blue", "green", "yellow") cat("Colors:", colors, "\n") # Logical vector passed <- c(TRUE, FALSE, TRUE, TRUE, FALSE) cat("Passed:", passed, "\n") # Single value — still a vector (length 1) x <- 42 cat("x is a vector of length", length(x), "\n") cat("is.vector(x):", is.vector(x), "\n")

Sequence shortcuts

R provides powerful shortcuts for creating sequences:

# Colon operator — integer sequence one_to_ten <- 1:10 cat("1:10 →", one_to_ten, "\n") ten_to_one <- 10:1 cat("10:1 →", ten_to_one, "\n") # seq() — flexible sequences by_twos <- seq(2, 20, by = 2) cat("Even numbers:", by_twos, "\n") five_points <- seq(0, 1, length.out = 5) cat("5 evenly spaced:", five_points, "\n") # rep() — repeat values threes <- rep(3, times = 5) cat("rep(3, 5):", threes, "\n") pattern <- rep(c(1, 2, 3), times = 3) cat("Pattern repeated:", pattern, "\n") each_repeated <- rep(c("A", "B", "C"), each = 2) cat("Each repeated:", each_repeated, "\n")

Notice the difference between times and each in rep(): times repeats the whole vector; each repeats each element before moving to the next.

Combining vectors

You can combine existing vectors into larger ones using c():

first_half <- c(1, 2, 3) second_half <- c(4, 5, 6) # Combine into one vector combined <- c(first_half, second_half) cat("Combined:", combined, "\n") # Add single elements extended <- c(0, first_half, 3.5, second_half, 7) cat("Extended:", extended, "\n")

Accessing Elements (Subsetting)

R uses square brackets [] to access vector elements. R uses 1-based indexing — the first element is at position 1, not 0.

By position

fruits <- c("apple", "banana", "cherry", "date", "elderberry") # Single element cat("1st:", fruits[1], "\n") cat("3rd:", fruits[3], "\n") cat("Last:", fruits[length(fruits)], "\n") # Multiple elements cat("1st and 3rd:", fruits[c(1, 3)], "\n") cat("First three:", fruits[1:3], "\n") # Negative indexing — exclude elements cat("All except 2nd:", fruits[-2], "\n") cat("Except 1st and 5th:", fruits[c(-1, -5)], "\n")

Key point: Negative indices exclude elements. fruits[-2] means "everything except the 2nd element." You cannot mix positive and negative indices in the same bracket.

By logical vector

This is the most powerful form of subsetting — and the foundation of data filtering in R:

scores <- c(88, 72, 95, 61, 83, 77, 90) # Create a logical condition above_80 <- scores > 80 cat("Above 80?", above_80, "\n") # Use the logical vector to filter cat("Scores above 80:", scores[above_80], "\n") # Do it in one step (most common pattern) cat("Scores above 80:", scores[scores > 80], "\n") cat("Scores 70-85:", scores[scores >= 70 & scores <= 85], "\n") # Count how many pass cat("Count above 80:", sum(scores > 80), "\n") cat("Percentage above 80:", mean(scores > 80) * 100, "%\n")

The pattern vector[condition] is one of the most important patterns in R. You'll use it constantly in data analysis.

By name

Vectors can have named elements, which you can use for access:

# Named vector ages <- c(Alice = 25, Bob = 32, Carol = 28, David = 45) cat("Ages:", ages, "\n") # Access by name cat("Alice's age:", ages["Alice"], "\n") cat("Bob and Carol:", ages[c("Bob", "Carol")], "\n") # See all names cat("Names:", names(ages), "\n") # Add names to an existing vector scores <- c(88, 72, 95) names(scores) <- c("Math", "English", "Science") cat("Scores:", scores, "\n") cat("Science:", scores["Science"], "\n")

Named vectors are like simple lookup tables. They're especially useful when you need to map codes to labels or store configuration values.

Modifying Vectors

Change individual elements

colors <- c("red", "blue", "green", "yellow") cat("Original:", colors, "\n") # Change one element colors[2] <- "purple" cat("After change:", colors, "\n") # Change multiple elements colors[c(1, 4)] <- c("orange", "pink") cat("After multiple changes:", colors, "\n") # Conditional replacement scores <- c(88, 45, 92, 38, 77, 55, 91) cat("Original scores:", scores, "\n") # Replace all failing scores with 50 (minimum grade policy) scores[scores < 50] <- 50 cat("After minimum grade:", scores, "\n")

The conditional replacement scores[scores < 50] <- 50 is a powerful one-liner. It says: "find all elements below 50, and set them to 50."

Add and remove elements

x <- c(10, 20, 30) # Append to the end x <- c(x, 40, 50) cat("After append:", x, "\n") # Prepend to the beginning x <- c(0, x) cat("After prepend:", x, "\n") # Insert in the middle (at position 3) x <- c(x[1:2], 15, x[3:length(x)]) cat("After insert:", x, "\n") # Remove by position x <- x[-3] # Remove the element at position 3 cat("After remove:", x, "\n") # Remove by value (keep everything that's not 30) x <- x[x != 30] cat("After removing 30:", x, "\n")

Note: R vectors don't have a built-in "insert at position" or "delete" function like Python lists do. You create a new vector by combining the pieces you want. This isn't a problem in practice — R is designed for working with whole vectors, not individual element operations.

Vectorized Operations: R's Superpower

Vectorization is the #1 reason R is fast at data analysis. When you apply an operation to a vector, R applies it to every element simultaneously — no loops needed.

# Math on entire vectors — no loop required! prices <- c(10, 25, 8, 42, 15) # Add tax to every price (8%) with_tax <- prices * 1.08 cat("Original: ", prices, "\n") cat("With tax: ", round(with_tax, 2), "\n") # Convert Fahrenheit to Celsius temps_f <- c(72, 68, 75, 80, 65) temps_c <- round((temps_f - 32) * 5/9, 1) cat("\nFahrenheit:", temps_f, "\n") cat("Celsius: ", temps_c, "\n")

In Python, you'd need a for loop or list comprehension for this. In R, it's one line. This makes R code more concise, more readable, and significantly faster (R executes vectorized operations in optimized C code under the hood).

Element-wise operations on two vectors

When you combine two vectors with an operator, R matches them element by element:

# Element-wise operations quantities <- c(2, 1, 3, 4, 2) unit_prices <- c(10, 25, 8, 42, 15) totals <- quantities * unit_prices cat("Quantities: ", quantities, "\n") cat("Unit prices:", unit_prices, "\n") cat("Totals: ", totals, "\n") cat("Grand total:", sum(totals), "\n")

Recycling: what happens when vectors are different lengths

If two vectors have different lengths, R recycles the shorter one — repeating it until it matches the longer one:

# Same length — straightforward x <- c(1, 2, 3) y <- c(10, 20, 30) cat("Same length:", x + y, "\n") # Scalar + vector — scalar gets recycled cat("x + 100:", x + 100, "\n") # 100 is recycled: c(100, 100, 100) # Different lengths — shorter recycled a <- c(1, 2, 3, 4, 5, 6) b <- c(10, 20) cat("Recycled:", a + b, "\n") # b becomes c(10, 20, 10, 20, 10, 20) # Warning when lengths don't divide evenly a2 <- c(1, 2, 3, 4, 5) b2 <- c(10, 20) cat("Uneven recycle:", a2 + b2, "\n") # Works but warns

Scalar recycling (adding a single number to a vector) is useful and common. Multi-element recycling can be a source of bugs — R will warn you when the lengths don't divide evenly.

Useful Vector Functions

Here are the functions you'll use most often with vectors:

x <- c(23, 45, 12, 67, 34, 89, 56, 8, 71, 42) # Summary statistics cat("Length:", length(x), "\n") cat("Sum: ", sum(x), "\n") cat("Mean: ", mean(x), "\n") cat("Median:", median(x), "\n") cat("Min: ", min(x), "\n") cat("Max: ", max(x), "\n") cat("Range: ", range(x), "\n") cat("SD: ", round(sd(x), 2), "\n") # Position functions cat("\nPosition of max:", which.max(x), "(value:", x[which.max(x)], ")\n") cat("Position of min:", which.min(x), "(value:", x[which.min(x)], ")\n")

Sorting and ordering

x <- c(23, 45, 12, 67, 34) names(x) <- c("A", "B", "C", "D", "E") # sort() — returns sorted values cat("Ascending:", sort(x), "\n") cat("Descending:", sort(x, decreasing = TRUE), "\n") # order() — returns the indices that would sort the vector idx <- order(x) cat("Order indices:", idx, "\n") cat("Sorted via order:", x[idx], "\n") # rev() — reverse a vector cat("Reversed:", rev(x), "\n") # unique() — remove duplicates dupes <- c(1, 2, 2, 3, 3, 3, 4) cat("Unique values:", unique(dupes), "\n") cat("Duplicate table:\n") print(table(dupes))

order() is more useful than sort() in practice because it tells you where each sorted element came from — essential for sorting data frames (you sort by one column and need the other columns to follow along).

Set operations

a <- c(1, 2, 3, 4, 5) b <- c(3, 4, 5, 6, 7) cat("Union:", union(a, b), "\n") # All unique values from both cat("Intersect:", intersect(a, b), "\n") # Values in both cat("Setdiff(a, b):", setdiff(a, b), "\n") # In a but not b cat("Setdiff(b, a):", setdiff(b, a), "\n") # In b but not a cat("Is 3 in a?", 3 %in% a, "\n") # Membership test cat("Which of b are in a?", b %in% a, "\n")

The %in% operator is extremely useful for filtering: "show me all rows where the category is in this list."

Handling Missing Values (NA)

Real data almost always has missing values. R represents these as NA. Here's how to work with vectors that contain NAs:

# Vector with missing values temps <- c(72, NA, 75, 80, NA, 65, 71) cat("Temperatures:", temps, "\n") # Most functions return NA if any value is NA cat("Mean with NA:", mean(temps), "\n") # Use na.rm = TRUE to ignore NAs cat("Mean without NA:", mean(temps, na.rm = TRUE), "\n") cat("Sum without NA:", sum(temps, na.rm = TRUE), "\n") # Find and count NAs cat("Is NA:", is.na(temps), "\n") cat("NA count:", sum(is.na(temps)), "\n") cat("Non-NA count:", sum(!is.na(temps)), "\n") # Remove NAs from a vector clean_temps <- temps[!is.na(temps)] cat("Clean:", clean_temps, "\n") # Or use na.omit() cat("na.omit:", na.omit(temps), "\n")

The na.rm = TRUE pattern appears in almost every R script that handles real data. Memorize it.

Practice Exercises

Exercise 1: Vector Basics

Create a vector of daily step counts and analyze it:

# Exercise: A fitness tracker recorded daily steps for two weeks # Week 1: 8200, 10500, 7800, 12000, 9500, 15000, 6000 # Week 2: 9000, 11200, 8500, 10000, 13500, 14000, 7500 # # 1. Combine both weeks into one vector # 2. Find the total steps, daily average, best and worst day # 3. How many days exceeded 10,000 steps? # 4. What percentage of days exceeded 10,000 steps? # Write your code below:

Click to reveal solution

# Solution week1 <- c(8200, 10500, 7800, 12000, 9500, 15000, 6000) week2 <- c(9000, 11200, 8500, 10000, 13500, 14000, 7500) all_steps <- c(week1, week2) cat("Total steps:", sum(all_steps), "\n") cat("Daily average:", round(mean(all_steps), 0), "\n") cat("Best day:", max(all_steps), "(day", which.max(all_steps), ")\n") cat("Worst day:", min(all_steps), "(day", which.min(all_steps), ")\n") cat("Days over 10k:", sum(all_steps > 10000), "\n") cat("Percent over 10k:", round(mean(all_steps > 10000) * 100, 1), "%\n")

Explanation: sum(all_steps > 10000) counts how many TRUE values there are in the logical vector. mean(all_steps > 10000) gives the proportion, which we multiply by 100 for the percentage.

Exercise 2: Filtering and Replacing

Clean a vector of test scores:

# Exercise: A teacher has test scores with some data entry errors # scores <- c(88, 102, 75, -5, 91, 200, 83, 67, 95, -10) # # 1. Scores must be between 0 and 100 # 2. Find and report which positions have invalid scores # 3. Replace invalid scores with NA # 4. Calculate the mean of valid scores only # Write your code below:

Click to reveal solution

# Solution scores <- c(88, 102, 75, -5, 91, 200, 83, 67, 95, -10) cat("Original:", scores, "\n") # Find invalid positions invalid <- scores < 0 | scores > 100 cat("Invalid positions:", which(invalid), "\n") cat("Invalid values:", scores[invalid], "\n") # Replace invalid with NA scores[invalid] <- NA cat("Cleaned:", scores, "\n") # Calculate mean of valid scores cat("Mean (valid only):", round(mean(scores, na.rm = TRUE), 1), "\n") cat("Valid count:", sum(!is.na(scores)), "of", length(scores), "\n")

Explanation: The | (OR) operator combines two conditions. which() returns the positions where the condition is TRUE. Setting those positions to NA and using na.rm = TRUE cleanly handles the invalid data.

Exercise 3: Named Vectors as Lookup Tables

Create a grading system using named vectors:

# Exercise: Create a named vector that maps letter grades to GPA values # A = 4.0, B = 3.0, C = 2.0, D = 1.0, F = 0.0 # # A student's grades: c("A", "B", "A", "C", "B", "A") # 1. Use the lookup vector to convert letters to GPA values # 2. Calculate the student's GPA (mean of all grades) # Hint: You can use a named vector as a lookup: lookup[keys] # Write your code below:

Click to reveal solution

# Solution # Create the lookup table gpa_lookup <- c(A = 4.0, B = 3.0, C = 2.0, D = 1.0, F = 0.0) # Student's grades grades <- c("A", "B", "A", "C", "B", "A") # Convert using the lookup vector gpa_values <- gpa_lookup[grades] cat("Grades:", grades, "\n") cat("GPA values:", gpa_values, "\n") cat("GPA:", round(mean(gpa_values), 2), "\n")

Explanation: gpa_lookup[grades] uses each element of grades as a name to look up the corresponding value in gpa_lookup. This is one of the most elegant patterns in R — using named vectors as dictionaries/hash maps.

Summary

Operation	Code	Example
Create	`c()`	`c(1, 2, 3)`
Sequence	`:`, `seq()`	`1:10`, `seq(0, 1, by = 0.1)`
Repeat	`rep()`	`rep("x", 3)`
Access by position	`[n]`	`x[3]`
Access by condition	`[condition]`	`x[x > 5]`
Access by name	`["name"]`	`ages["Alice"]`
Exclude	`[-n]`	`x[-1]`
Modify	`[n] <- value`	`x[2] <- 99`
Length	`length()`	`length(x)`
Sort	`sort()`	`sort(x, decreasing = TRUE)`
Find position	`which()`	`which(x > 5)`
Membership	`%in%`	`3 %in% x`
Missing values	`na.rm = TRUE`	`mean(x, na.rm = TRUE)`

R's superpower: Vectorized operations apply to all elements at once — no loops needed. x * 2 doubles every element. x[x > 0] filters in one step.

FAQ

What's the difference between a vector and a list?

A vector requires all elements to have the same type (all numeric, all character, etc.). A list can hold elements of different types — even other vectors, data frames, or models. Vectors are simpler and faster; lists are more flexible.

Can a vector have zero elements?

Yes. c() with no arguments creates an empty numeric vector. You can also create typed empty vectors: character(0), numeric(0), logical(0). Empty vectors are useful when building up results in a loop.

Why is R 1-indexed instead of 0-indexed?

R was designed by statisticians, and statisticians count starting at 1 (the first observation, the first row). Most other languages (Python, Java, C) use 0-based indexing. This is a common source of confusion when switching between R and Python.

How do I add an element to a specific position?

R doesn't have an insert() function. You rebuild the vector: c(x[1:2], new_value, x[3:length(x)]). This seems awkward, but in practice you rarely insert single elements — R is designed for whole-vector operations.

What's the maximum size of a vector?

Theoretically, up to 2^31 - 1 elements (~2.1 billion) on standard R, or 2^52 on 64-bit R with long vectors enabled. In practice, you're limited by RAM. A vector of 100 million doubles uses about 800 MB of memory.

What's Next?

Now that you've mastered vectors, you're ready for more complex data structures:

R Data Frames — tabular data with rows and columns, where each column is a vector
R Lists — the flexible container that holds any type of object
R Control Flow — if/else, for loops, and while loops

Each tutorial builds on your vector knowledge — data frames are collections of vectors, and most loop operations can be replaced by vectorized operations.

r-statistics.co by Selva Prabhakaran

R Vectors: The Foundation of Everything in R (Master This First)

Introduction

Creating Vectors

The c() function

Sequence shortcuts

Combining vectors

Accessing Elements (Subsetting)

By position

By logical vector

By name

Modifying Vectors

Change individual elements

Add and remove elements

Vectorized Operations: R's Superpower

Element-wise operations on two vectors

Recycling: what happens when vectors are different lengths

Useful Vector Functions

Sorting and ordering

Set operations

Handling Missing Values (NA)

Practice Exercises

Exercise 1: Vector Basics

Exercise 2: Filtering and Replacing

Exercise 3: Named Vectors as Lookup Tables

Summary

FAQ

What's the difference between a vector and a list?

Can a vector have zero elements?

Why is R 1-indexed instead of 0-indexed?

How do I add an element to a specific position?

What's the maximum size of a vector?

What's Next?

On this page