dplyr pull() in R: Extract a Column as a Vector
The pull() function in dplyr extracts a single column from a data frame and returns it as a plain vector. It is the pipe-friendly equivalent of base R's $ and [[ operators.
pull(df, mpg) # extract mpg as vector pull(df, "mpg") # quoted name also works pull(df, 1) # by position (column 1) pull(df, last_col()) # last column pull(df, mpg, name = car_name) # named vector df |> filter(cyl == 4) |> pull(mpg) # in pipeline mean(pull(df, mpg)) # feed to a function
Need explanation? Read on for examples and pitfalls.
What pull() does in one sentence
pull() extracts ONE column from a data frame and returns a plain VECTOR. Unlike select(df, x) which returns a one-column data frame, pull(df, x) strips the data frame wrapper and gives you the column's values directly.
This matters when downstream code expects a vector (e.g., mean(), length(), sum()). A one-column data frame is NOT the same as a vector; passing one where the other is expected often errors or produces surprising output.
Syntax
pull() takes a data frame and a single column reference. The reference can be a bare name, a quoted string, or an integer position. An optional name argument turns the output into a named vector.
The full signature:
pull(.data, var = -1, name = NULL, ...)
.data is the data frame. var is the column to extract; default is -1 (the LAST column). name is an optional column whose values become names of the output vector.
var is -1, meaning the LAST column. pull(df) with no second argument returns the last column. This is useful in pipelines where the last column is freshly created or selected: df |> mutate(z = ...) |> pull() extracts z.Six common patterns
1. Extract a column by name
The result is a numeric vector, NOT a data frame.
2. Extract by position
Integer positions are 1-indexed. pull(df, last_col()) extracts the last column by position helper.
3. Inside a pipeline
pull() is the bridge from "data frame world" to "vector world". After pull, you can use base R functions that expect vectors: mean(), sum(), range(), etc.
4. Named vector with the name argument
Setting name = car turns the output into a named vector. Useful for lookup tables and direct subsetting like named_mpg["Mazda RX4"].
5. Pull last column (default)
pull() with no argument returns the LAST column. Convenient after a mutate() where the new column is at the end.
6. Pull with quoted column name
Quoted strings work the same as bare names. Useful inside functions where the column name is stored in a variable.
pull(df, x) is to dplyr what df$x and df[["x"]] are to base R. All three return the same vector. The difference is pull() is pipe-friendly and works inside chains; $ requires the data frame as the LHS and breaks the pipe flow.pull() vs base R column extraction
Three ways to extract a column as a vector. Choose based on context.
| Task | dplyr | Base R | ||
|---|---|---|---|---|
| Extract by name | pull(df, mpg) |
df$mpg or df[["mpg"]] |
||
| In a pipe | `df \ | > filter(...) \ | > pull(mpg)` | (awkward; needs with() or temp var) |
| Quoted name | pull(df, "mpg") |
df[["mpg"]] |
||
| By position | pull(df, 1) |
df[[1]] |
||
| Named vector output | pull(df, mpg, name = id) |
setNames(df$mpg, df$id) |
||
| Default (last col) | pull(df) |
df[[ncol(df)]] |
When to use which:
- Use
pull()inside any dplyr pipeline. - Use
df$xordf[["x"]]for one-line scripts without other dplyr code. - Use
pull(df, "x")inside functions where the column name is a variable.
Common pitfalls
Pitfall 1: confusing pull() and select(). pull(df, x) returns a VECTOR. select(df, x) returns a one-column DATA FRAME. They behave differently when fed to other functions: mean(select(df, mpg)) errors, but mean(pull(df, mpg)) works.
Pitfall 2: pull() returns ONLY the last column by default. Calling pull(df) without specifying a column returns the LAST column. New users sometimes expect it to return all columns. To get a specific column, always specify: pull(df, mpg).
pull() strips data frame metadata including grouping. If df is grouped via group_by(), the pull result is just a vector with no group info. To preserve grouping for downstream operations, summarise within the group first, then pull.Pitfall 3: pulling from a grouped frame can return UNEXPECTED ORDER. group_by(df, g) |> pull(x) may not return values in the original row order, depending on dplyr version. If order matters, arrange() or ungroup() first.
Try it yourself
Try it: From mtcars, extract the hp column as a vector and compute its mean. Save the mean to ex_mean_hp.
Click to reveal solution
Explanation: pull(hp) extracts the hp column as a numeric vector. mean() then computes the average. This works because mean() expects a vector, which is exactly what pull() returns.
Related dplyr functions
After mastering pull(), look at:
select()(with one column): returns a data frame, not a vector$and[[in base R: equivalent to pull but not pipe-friendlytibble::deframe(): convert a 2-column tibble to a named vectortibble::enframe(): inverse, convert a named vector to a tibble
For extracting multiple columns, use select() and keep the data frame structure. pull() is specifically for the single-column case.
FAQ
What is the difference between pull and $ in dplyr and base R?
pull(df, x) and df$x both return the same vector. The difference is composability: pull() works inside dplyr pipes (df |> filter(...) |> pull(x)), while $ requires the data frame on the left side and breaks the pipe flow.
How do I extract a single column as a vector in dplyr?
Use pull(df, column_name). The result is a vector you can pass to mean(), sum(), length(), etc. Use select(df, column_name) instead if you want a one-column data frame.
Can I use pull with a quoted string for the column name?
Yes. pull(df, "mpg") and pull(df, mpg) produce the same result. Use quoted strings inside functions where the column name is stored in a variable.
What does pull return for a grouped data frame?
A plain vector with no grouping. The vector contains values for all rows, but the group structure is gone. If grouping matters for the downstream operation, summarise within the group first, or ungroup() before pulling.
How do I get a named vector from pull?
Pass the name argument: pull(df, value_col, name = name_col). The result is a named vector where each name comes from name_col and each value from value_col. Useful for lookup tables.