Memoization in R: memoise Package — Cache Expensive Functions
Memoization caches a function's results so that calling it again with the same arguments returns the cached result instantly. The memoise package makes this a one-line change in R.
If your function is slow and you call it repeatedly with the same inputs, memoization can eliminate redundant computation entirely. The first call computes and stores the result; every subsequent call with the same arguments returns the stored answer in microseconds.
The Problem: Redundant Computation
# Simulate a slow function (e.g., API call, complex model)
slow_square <- function(x) {
Sys.sleep(0.5) # Pretend this takes time
x^2
}
# Calling it twice with same input wastes time
t1 <- system.time(slow_square(5))
t2 <- system.time(slow_square(5)) # Same input, same wait
cat("First call:", t1["elapsed"], "sec\n")
cat("Second call:", t2["elapsed"], "sec\n")
cat("Total wasted:", t1["elapsed"] + t2["elapsed"], "sec\n")
Basic Memoization with memoise
library(memoise)
slow_square <- function(x) {
Sys.sleep(0.5)
x^2
}
# Wrap with memoise — one line change
fast_square <- memoise(slow_square)
# First call: computes and caches
t1 <- system.time(result1 <- fast_square(5))
cat("First call:", t1["elapsed"], "sec, result:", result1, "\n")
# Second call: instant from cache
t2 <- system.time(result2 <- fast_square(5))
cat("Second call:", t2["elapsed"], "sec, result:", result2, "\n")
# Different argument: computes fresh
t3 <- system.time(result3 <- fast_square(10))
cat("New input:", t3["elapsed"], "sec, result:", result3, "\n")
memoise(f) returns a new function that behaves identically to f but caches results. The cache key is the function arguments — same arguments, same cached result.
How Memoization Works
graph LR
A[Call f(x)] --> B{x in cache?}
B -->|Yes| C[Return cached result]
B -->|No| D[Compute f(x)]
D --> E[Store result in cache]
E --> C
The cache is a key-value store:
Key: The function's arguments (hashed)
Value: The computed result
library(memoise)
# Simple example showing cache behavior
counter <- 0
tracked_fn <- function(x) {
counter <<- counter + 1
x * 10
}
memo_fn <- memoise(tracked_fn)
memo_fn(5) # Computes: counter = 1
memo_fn(5) # Cached: counter still 1
memo_fn(10) # Computes: counter = 2
memo_fn(5) # Cached: counter still 2
cat("Function was actually called", counter, "times\n")
cat("But memo_fn was called 4 times\n")
When to Memoize
Memoization works best when a function is:
Condition
Why it matters
Pure (same input → same output)
Cached results must be correct next time
Expensive (slow to compute)
Otherwise caching overhead isn't worth it
Called repeatedly with same args
No reuse = no benefit
Results fit in memory
Large results bloat the cache
Functions that should NOT be memoized:
Functions with side effects (printing, writing files, database updates)
Functions that depend on external state (current time, random numbers)
Functions with very large return values that would exhaust memory
Create a function that simulates an expensive calculation and memoize it.
library(memoise)
# Create a function that: takes a number, sleeps 0.2 sec, returns its factorial
# Memoize it and verify the cache works
Click to reveal solution
```r
library(memoise)
slow_factorial <- function(n) {
Sys.sleep(0.2)
factorial(n)
}
fast_factorial <- memoise(slow_factorial)
# First calls compute
for (n in c(5, 10, 5, 10, 15)) {
t <- system.time(r <- fast_factorial(n))
cached <- if (t["elapsed"] < 0.1) "CACHED" else "COMPUTED"
cat(sprintf("n=%2d: %s (%.3fs) = %s\n", n, cached, t["elapsed"], format(r, big.mark=",")))
}
**Explanation:** The first call for each unique `n` computes and caches. Subsequent calls with the same `n` return instantly from cache.
Summary
Feature
Function
Description
Memoize
memoise(f)
Wrap function with cache
Clear cache
forget(f)
Remove all cached results
Check cache
has_cache(f)(args)
Test if specific args are cached
Check memoized
is.memoised(f)
Test if function is memoized
FAQ
Does memoization use a lot of memory?
It depends on the size of results being cached. Each unique set of arguments stores one result. For small return values (numbers, short strings), memory is minimal. For large objects (data frames, model objects), the cache can grow quickly. Use forget() to clear when done.
Can I memoize functions from packages?
Yes. memoise(stats::lm) would cache linear model fits. But be careful — model-fitting functions may depend on mutable data, making cached results stale.
Is memoization the same as caching?
Memoization is a specific type of caching: caching function results based on their arguments. General caching can store anything anywhere. Memoization is automatic, transparent, and tied to a specific function.