R Names and Values: How R Actually Stores Data (Copy-on-Modify Explained)
In R, variables are names bound to values, not boxes containing data. When you write y <- x, you don't copy the data -- you attach a second name tag to the same value. R only copies when you modify a shared value. Understanding this "copy-on-modify" model is the key to writing fast, memory-efficient R code.
Most R users never think about this -- until they run out of memory or wonder why their code is slow. This tutorial explains the mental model, shows you how to observe it, and teaches you how to take advantage of it.
Names Are Tags, Not Boxes
Think of R variables as sticky notes attached to objects in memory, not as containers that hold data.
This is why assignment in R is fast regardless of object size. y <- x for a 1GB data frame is instant -- it just adds another name tag.
Copy-on-Modify
When two names point to the same value and you modify through one of them, R makes a copy first. This ensures the other name still sees the original data.
When does the copy happen?
The copy happens at the exact moment of modification -- not before, not after. This is lazy (deferred) copying.
Modify-in-Place Optimization
If only one name points to a value, R can modify in place without copying. This is a critical optimization.
Functions and reference counts
When you pass a variable to a function, R increments the reference count. Inside the function, modifying the argument triggers a copy.
Observing with tracemem()
tracemem() marks an object so R prints a message every time it gets copied.
Data frame column behavior
Reference Counting in Practice
R uses reference counting to decide whether to copy or modify in place. Here are practical implications:
Lists and Environments: Different Rules
Lists use copy-on-modify like vectors, but environments use reference semantics (like R6 objects).
This is why R6 classes (which are based on environments) have reference semantics, while S3 objects (which are based on lists) have copy-on-modify semantics.
The lobstr Package
The lobstr package provides tools to inspect R's memory model. While it may not be available in WebR, here's what it does:
Summary Table
| Concept | Description | Performance Implication |
|---|---|---|
| Name binding | Variables are names pointing to values | Assignment is O(1), regardless of size |
| Copy-on-modify | Copy happens only when shared data is modified | Reads are free; writes may copy |
| Modify-in-place | If only one name exists, no copy needed | Removing unused variables helps |
| Reference counting | R tracks how many names point to each value | Functions increment the count |
| Shallow copy | Lists/data frames copy only the modified element | Column-wise ops are cheaper |
| Environments | Always reference semantics (no copy) | R6/environments are mutable |
Practice Exercises
Exercise 1: Predict the output: x <- 1:5; y <- x; y[3] <- 0; cat(x[3]). Will it print 3 or 0? Why?
Click to reveal solution
It prints **3**. When `y[3] <- 0` executes, R sees that `x` and `y` share the same data. Copy-on-modify kicks in: R copies the data for `y`, then modifies the copy. `x` is unchanged. ```rExercise 2: Why is result <- c(result, new_value) inside a loop slow for large vectors? How would you fix it?
Click to reveal solution
Each `c(result, new_value)` creates a new, longer vector and copies all existing elements. For n iterations, this is O(n^2) total copies. Fix: pre-allocate the vector. ```rExercise 3: Given that environments have reference semantics, what happens here: e <- new.env(); e$x <- 1; f <- e; f$x <- 2; cat(e$x)?
Click to reveal solution
It prints **2**. Environments use reference semantics (not copy-on-modify). `f <- e` does not copy the environment. Both `e` and `f` point to the same environment, so modifying `f$x` also changes `e$x`. ```rFAQ
Q: Is copy-on-modify the same as "pass by value"? Not exactly. True pass-by-value copies immediately on assignment. R's copy-on-modify defers the copy until modification. It's more accurately called "pass by value with copy-on-modify optimization." The end result is the same (no surprise mutations) but with better memory efficiency.
Q: Can I force R to modify in place? You can't directly force it, but you can arrange your code so that only one reference exists when you modify. For example, use rm() to remove extra references before modifying, or wrap modifications in a function (local scope limits references).
Q: Do all R objects use copy-on-modify? Most do: vectors, lists, data frames, matrices. The exceptions are environments (always reference) and certain internal objects. R6 classes use environments internally, which is why they have reference semantics.
What's Next
- R Assignment Deep Dive -- All assignment operators and their scoping rules
- Understanding R Memory with lobstr -- Inspect object sizes and references
- R Copy-on-Modify -- More examples of copy-on-modify in action