arrow read_feather() in R: Read Feather Files Fast
The arrow read_feather() function loads a Feather file into an R data frame in one call. Feather is the Arrow IPC format, a binary layout tuned for raw read speed, so read_feather() imports are among the fastest ways to move a table on or off disk.
read_feather("data.feather") # read a whole file
read_feather("data.feather", col_select = c(mpg, hp)) # read only some columns
read_feather("data.arrow") # .arrow extension works too
read_feather("data.feather", as_data_frame = FALSE) # return an Arrow Table
read_feather("data.feather", col_select = starts_with("S")) # tidyselect columns
nrow(read_feather("data.feather")) # check row countNeed explanation? Read on for examples and pitfalls.
What read_feather() does
read_feather() reads one Feather file into R. A Feather file stores a table in the Arrow IPC format, a binary layout that R, Python, and Julia all read with no conversion. You hand read_feather() a path and it returns a tibble with column types already set, so nothing is guessed the way a CSV reader has to guess.
The function comes from the arrow package, which bundles the Apache Arrow C++ engine. Feather was designed for one job: move a data frame between processes and languages as fast as the disk allows. Because the on-disk layout mirrors Arrow's in-memory layout, read_feather() does almost no decoding work, which is what makes it quick.
Syntax and key arguments
The signature is short, and col_select does most of the tuning. You usually pass just file, and reach for col_select when the file is wide and you only need a few columns.
The col_select argument accepts the same tidyselect helpers as dplyr::select(), such as starts_with() and c(). Setting as_data_frame = FALSE returns a lazy Arrow Table instead of pulling everything into an R data frame. The mmap argument memory-maps the file and rarely needs changing.
pandas.read_feather(). A Feather file written by pandas or Julia reads into R untouched, because all three share the Arrow IPC format on disk.read_feather() examples
Start with a round trip. Write a built-in dataset to a Feather file with write_feather(), then read it back so every example has a real file to work with.
The data returns as a 32-row, 11-column tibble. No delimiter was parsed and no column type was inferred, because the Feather file stored the schema alongside the data.
Read only the columns you need. Pass col_select and read_feather() decodes just those columns, leaving the rest on disk. Feather is columnar, so naming columns up front is far faster than reading everything and dropping columns later.
Only three columns came back. The other eight were never read off disk.
Feather preserves column types. A CSV round trip flattens factors and dates into text, so you rebuild those types on every reload. A Feather round trip keeps them, because the file carries a typed schema.
The factor returns with its levels intact and the date column returns as a Date. This type fidelity makes Feather a reliable cache for cleaned data between sessions.
Return an Arrow Table instead of a data frame. Set as_data_frame = FALSE when a file is too wide to pull fully into memory. The Table is a lazy handle you can filter before collecting.
The Table reports 32 rows without materializing them as an R data frame. You can run dplyr verbs on it and call collect() only when you need the result in memory.
read_feather() vs read_parquet() and alternatives
read_feather() is the fast IPC reader; read_parquet() is the compact archival reader. Both return a tibble, so the choice comes down to whether you are optimizing for read speed or for file size.
| Function | Reads | Format | Optimized for | Best for |
|---|---|---|---|---|
read_feather() |
one file | Arrow IPC binary | raw speed | caching, cross-language handoff |
read_parquet() |
one file | columnar binary | small file size | long-term analytic storage |
open_dataset() |
many files | partitioned files | lazy queries | data larger than memory |
read_csv() |
one file | text | portability | sharing with any tool |
Use read_feather() for short-lived data: a cache between R sessions, or a handoff to a Python or Julia process. Use read_parquet() when the file is archived for months and disk size matters. Reach for read_csv() only when a person or tool that cannot read Arrow needs the data.
Common pitfalls
Confusing Feather V1 and V2. Modern write_feather() produces Feather V2, which is the Arrow IPC format. Files written years ago by the old feather package are V1. read_feather() reads both, but only V2 supports compression and the full Arrow type set. Always write V2, which is the default.
Pointing read_feather() at a folder. read_feather() reads exactly one file. A directory of Feather files will not load with a single call. Use open_dataset("folder/", format = "feather") instead, which treats the whole folder as one queryable dataset.
Treating Feather as a long-term archive. Feather trades file size for speed, so a Feather file is usually larger than the same data as Parquet. For data you keep for months, write Parquet. Keep Feather for caches and process-to-process transfer where speed matters more than bytes.
arrow::arrow_info() to check capabilities, then reinstall arrow with install.packages("arrow") to pull a complete build.Try it yourself
Try it: Write the iris data frame to iris.feather, then read back only the Species and Sepal.Length columns into ex_iris. Save the column names to ex_names.
Click to reveal solution
Explanation: col_select takes bare column names through tidyselect, so read_feather() decodes only those two columns from the file and leaves the other three on disk.
Related arrow functions
read_feather() sits among arrow's import and export helpers. Pick the one that matches the file you have.
write_feather(): write a data frame to a.featherfile, the inverse of this function.read_parquet(): read a compressed Parquet file, better for long-term storage.read_ipc_stream(): read the streaming Arrow IPC format rather than the file format.open_dataset(): query a folder of Feather or Parquet files as one dataset.read_rds(): restore a single R object from an R-only.rdsfile.
For the full argument list, see the arrow read_feather reference on arrow.apache.org.
FAQ
What is a Feather file in R?
A Feather file is a binary table stored in the Apache Arrow IPC format. In R it is read with read_feather() from the arrow package. Because the file layout mirrors Arrow's in-memory layout, reading it involves almost no parsing, which makes Feather one of the fastest formats for moving a data frame on or off disk.
What is the difference between Feather and Parquet in R?
Both are columnar binary formats read by the arrow package, but they optimize for different goals. Feather, read with read_feather(), is tuned for raw speed and cross-language handoff. Parquet, read with read_parquet(), is more heavily compressed and tuned for small file size and long-term storage. Use Feather for caches, Parquet for archives.
Is Feather faster than Parquet in R?
Usually yes for reading, because Feather does little decoding while Parquet must decompress and decode columns. The gap is widest on local disks. Parquet, however, produces smaller files, so it can win when input and output are bound by network or disk size rather than CPU. For repeated reads of cached data, Feather is the faster choice.
Can read_feather() read a folder of Feather files?
No. read_feather() reads exactly one file. To treat a directory of Feather files as a single table, use open_dataset("folder/", format = "feather"). It scans the folder, combines the files, and lets you filter with dplyr verbs before pulling any rows into memory.
How do I read only some columns from a Feather file?
Pass the col_select argument: read_feather("data.feather", col_select = c(a, b)). It accepts bare column names and tidyselect helpers like starts_with(). Because Feather is columnar, unselected columns are never read off disk, so this is much faster than reading everything and dropping columns afterward.