R Pipe Operator: %>% vs |>, The Complete Guide to Both Pipes
The pipe operator takes the output of one function and feeds it as the first argument to the next, turning nested calls into a readable left-to-right sequence. R has two pipes, %>% from magrittr and |> built into base R, and this guide shows you exactly when to use each.
What problem does the pipe solve?
Without a pipe, multi-step transformations nest inside each other, and you read them inside-out. With a pipe, they read top-to-bottom like a recipe. Let's see both versions of the same computation.
Same result. But the piped version says plainly: "take this vector, take its log, take the mean, round to 2 decimals." No mental gymnastics, no parenthesis-counting. For a chain of 4-5 steps the difference is transformative.
Try it: Rewrite sum(sqrt(1:10)) as a pipeline with |>.
Click to reveal solution
The pipeline reads left to right: start with 1..10, take the square root of each element, then sum the resulting vector. Same answer as the nested sum(sqrt(1:10)) but each step is explicit and reorderable.
What's the difference between %>% and |>?
The magrittr pipe %>% has been around since 2014 and is used by dplyr, ggplot2, and the whole tidyverse. The native pipe |> was added to base R in version 4.1 (May 2021). They're almost identical for day-to-day work, but there are three real differences.
Difference 1, Parentheses required. The native pipe requires () on the right-hand side: x |> mean() works, x |> mean does not. The magrittr pipe allows both.
Difference 2, No anonymous dot. magrittr lets you use . as a placeholder for the piped value anywhere: x %>% lm(y ~ z, data = .). The native pipe's placeholder is _ and it only works with named arguments and only once per call.
Difference 3, No dependency. |> is in base R, no packages needed. %>% requires magrittr or any package that re-exports it (dplyr, tidyr, etc.).
|>. It's faster (no function call), has no dependency, and works everywhere. Only fall back to %>% when you need its dot-placeholder flexibility or you're in a codebase that already uses it.Try it: Use the native pipe to fit lm(mpg ~ hp, data = mtcars) without writing mtcars inside lm().
Click to reveal solution
The native pipe's _ placeholder plugs the left-hand side into any named argument on the right, here data = _ puts mtcars into lm()'s second slot so the formula can stay first. It only works with named arguments and only once per call.
How does the pipe decide where to insert the value?
Both pipes insert the left-hand side as the first argument of the function on the right-hand side. If the function expects the data somewhere else, you need a placeholder or an anonymous function.
That last one is the native pipe's universal escape hatch: (\(x) ...)() wraps the rest of the expression in an inline function and calls it. Ugly, but it works when nothing else fits.
Try it: Pipe 1:5 into a custom operation that returns x^2 + 1 using an anonymous function.
Click to reveal solution
The lambda \(x) x^2 + 1 is created inline and then immediately called with the piped value. The trailing () is what makes the pipe call the function instead of just referencing it, this is the universal escape hatch when the operation doesn't already exist as a named function.
When is a pipeline worth using?
Pipes shine on chains of 3+ steps where each step transforms the previous result. Below that threshold, nested calls are fine. Above it, pipes become a quality-of-life upgrade.
The piped version isn't shorter, it's linear. You can drop in a print() or View() anywhere in the chain to debug. You can comment out a line to skip a step. That flexibility is the real win.
+ instead of |>), and most of the tidyverse. Learning to read them fluently is half the battle when picking up modern R.Try it: Write a pipeline on mtcars that filters gear == 4, then returns the mean mpg.
Click to reveal solution
filter(gear == 4) keeps only the 12 four-gear cars, and summarise(mean_mpg = mean(mpg)) collapses that subset to a single-row tibble with their average mpg. Because the data flows in via the pipe, neither call needs to repeat mtcars.
What are common pipe pitfalls?
Three traps catch new pipe users most often. Knowing them saves hours of debugging.
Pitfall 1, Forgetting () on the right side (native pipe only):
Pitfall 2, Piping into . without thinking. With magrittr's dot, you can accidentally double-insert the value:
Pitfall 3, Mixing pipe and + in ggplot2. ggplot2 uses +, not |>, to add layers. Beginners routinely try ggplot(df) |> geom_point(...) and get confused errors. Use |> before ggplot() and + between layers:
print() or write.csv()) expecting the return value to propagate. print(x) returns x invisibly, which does propagate. But many I/O functions return NULL, breaking the chain.Try it: Spot the bug, why doesn't c(1,2,3) %>% mean work as expected in some environments?
Click to reveal solution
The native pipe strictly requires a function call on the right-hand side, mean alone is just a reference to the function object, so c(1,2,3) |> mean errors with "The pipe operator requires a function call as RHS". Adding () makes it a call and the pipe can insert the LHS as its first argument. magrittr's %>% is looser and accepts the bare name, which is what catches people moving between the two pipes.
When should you NOT use the pipe?
Pipes are a tool, not a religion. Here are three cases where they hurt readability rather than help.
Don't pipe a single call. sort(x) is clearer than x |> sort(). The pipe adds visual noise with no benefit.
Don't pipe when the intermediate variable has meaning. If you'd describe a step as "the filtered customers" or "the standardized scores," save it to a named variable. A chain of 10 anonymous intermediate results is harder to debug than 3 named ones.
Don't pipe when you need the value twice. The pipe discards the original after one use. If step 3 needs both the current result and the original input, use a variable:
Try it: Decide which of these is clearer, single call or pipe: sqrt(16) vs 16 |> sqrt().
Click to reveal solution
For a single function call, the nested form is already left-to-right, there's nothing to flatten. 16 |> sqrt() adds two characters and an extra reading step for zero gain. Pipes earn their keep at three or more chained steps, not one.
Practice Exercises
Exercise 1: Refactor nested into pipe
Rewrite this nested call using |>:
Show solution
Exercise 2: Placeholder practice
Use the native pipe's _ placeholder to fit a linear model of mpg ~ wt + hp on mtcars without naming mtcars inside lm().
Show solution
Exercise 3: Dplyr pipeline
Using |> and dplyr, from iris: filter to Species == "versicolor", compute the mean of every numeric column.
Show solution
Putting It All Together
A complete one-pipeline analysis, load, clean, transform, summarize, and visualize, on mtcars.
Eight pipeline stages, one result. Every step reads in natural order, and any line can be commented out for quick debugging.
Summary
| Aspect | %>% (magrittr) |
`\ | >` (base) |
|---|---|---|---|
| Available since | 2014 | R 4.1 (2021) | |
| Package needed | magrittr (or tidyverse) | none | |
() on RHS required |
No | Yes | |
| Placeholder | . anywhere |
_ in named args only |
|
| Speed | Slower (function call) | Faster (syntax) | |
| Recommended for | Legacy code, . placeholder use |
New code, default choice |
References
- R 4.1.0 release notes, native pipe
- magrittr documentation
- R for Data Science, Pipes
- Advanced R, Function composition by Hadley Wickham
- Tidyverse Style Guide, Pipes
Continue Learning
- dplyr filter() and select(), the most common pipeline starting point.
- dplyr group_by() + summarise(), the pattern at the heart of every analysis.
- R Data Frames: Every Operation You'll Need, the structures that flow through pipelines.