dplyr pull vs select in R: Vector vs Data Frame Output
In dplyr, pull() extracts ONE column as a VECTOR; select() keeps ONE OR MORE columns as a DATA FRAME. They serve different purposes despite both "selecting" columns.
df |> pull(mpg) # numeric vector (length nrow(df)) df |> select(mpg) # 1-column tibble df |> pull(1) # by position df |> pull(mpg, name) # named vector df |> select(mpg, hp) # 2-column tibble (pull can't do this) df$mpg # base R equivalent of pull(mpg)
Need explanation? Read on for examples and pitfalls.
What pull() vs select() does in one sentence
pull(df, col) returns the values of one column as a VECTOR; select(df, ...) returns a DATA FRAME with only the specified columns. Both pick columns; they differ in output shape.
Side-by-side comparison
pull() when downstream code expects a VECTOR (e.g., mean(), length()). Use select() when downstream code expects a DATA FRAME (e.g., another dplyr verb).Five common patterns
1. Extract one column
2. Compute a stat on a column
select would return a tibble; mean would error.
3. Use a column as iteration target
select would loop over a 1-column tibble.
4. Multiple columns: select only
5. Named vector via pull
pull() vs select() vs $ vs [[
Four ways to extract column data in R.
| Approach | Returns | Pipe-friendly | |
|---|---|---|---|
| `df | > pull(col)` | Vector | Yes |
| `df | > select(col)` | 1-column tibble | Yes |
df$col |
Vector | No (LHS not pipeable) | |
df[["col"]] |
Vector | No | |
df[, "col"] |
Vector or tibble (depends) | No |
When to use which:
pullfor vector inside a pipe.selectfor data frame inside a pipe.$for interactive scripting (no pipe).[[for programmatic access with a string column name.
A practical workflow
The common "pull at the end of a filter chain" pattern.
The vector top_mpg can now be fed to mean, summary, or any function expecting a numeric vector.
The "select for downstream verb" pattern:
select keeps the data-frame shape for further dplyr operations.
Common pitfalls
Pitfall 1: pull with multiple columns. pull(mpg, hp) is wrong (the second arg is for naming, not selecting). Use select(mpg, hp) for multiple columns.
Pitfall 2: forgetting that select returns a data frame. select(mpg) |> mean() errors because mean expects a vector. Use pull instead.
pull(df, mpg, name) reuses the second positional arg as name. This is what creates a named vector. If you intended a second selection column, use select instead.Try it yourself
Try it: Get the mean mpg of 4-cylinder cars in two ways: with pull and with $. Save to ex_avg.
Click to reveal solution
Explanation: pull extracts mpg as a vector after filtering; mean computes the average.
Related dplyr / base functions
After mastering pull vs select, look at:
select(): pick multiple columnspull(): extract one column as vector$/[[: base R extractiontibble::deframe(): convert 2-column df to named vectordplyr::pluck(): deeper extraction (purrr alternative)
For named vector creation, pull(df, value, name) is a one-liner that replaces a multi-step deframe.
Why both functions exist
dplyr separates "narrow column selection" from "data extraction" deliberately. Most dplyr verbs work on data frames, so select fits naturally in a pipeline of verbs. But sometimes you need to BREAK OUT of the data-frame world (compute a scalar, build a named vector, feed a function that expects a vector), that's pull's job. Having two verbs with clear purposes is cleaner than overloading select to sometimes return a vector.
FAQ
What is the difference between pull and select in dplyr?
pull returns a VECTOR; select returns a DATA FRAME. pull is for one column; select can take multiple.
When should I use pull vs $ in R?
pull is pipe-friendly: df |> filter(...) |> pull(col). $ requires the data frame to be on the LHS of $, so it doesn't fit naturally in pipes.
Can pull extract multiple columns?
No. The second arg in pull is name, not a second column. For multiple columns, use select.
How do I create a named vector with pull?
pull(df, value, name). The values become the vector; the name column becomes vector names.
Is pull faster than $?
Negligibly different. pull does a tiny bit more work for tidyselect support but the difference doesn't matter for any practical use case.