fst read_fst() in R: Read .fst Data Files Fast
read_fst() in the fst package reads a .fst binary file into an R data frame fast, with optional column selection and row ranges so you load only the data you need.
read_fst("data.fst") # read the whole file
read_fst("data.fst", columns = c("a", "b")) # selected columns only
read_fst("data.fst", from = 1, to = 1000) # first 1000 rows
read_fst("data.fst", as.data.table = TRUE) # return a data.table
metadata_fst("data.fst") # inspect without loading
write_fst(df, "data.fst", compress = 50) # create a .fst fileNeed explanation? Read on for examples and pitfalls.
What read_fst() does
read_fst() loads a .fst binary file into an R data frame. The fst package stores data frames in a compressed, columnar format on disk. Because the layout is columnar and indexed, read_fst() can pull back a few columns or a slice of rows without scanning the whole file. That makes it the fastest practical way to reload a large data frame between R sessions.
A .fst file is produced by write_fst(). Once written, the file is self-describing: it records column names, column types, the row count, and the compression level. read_fst() reads that header first, then decompresses only the blocks you ask for.
read_fst() syntax and arguments
read_fst() takes a file path plus four optional arguments. The basic call is read_fst(path, columns, from, to, as.data.table). Only path is required; the rest default to reading the entire file as a plain data frame.
| Argument | Default | What it does |
|---|---|---|
path |
(required) | Path to the .fst file to read |
columns |
NULL |
Character vector of columns to read; NULL reads all |
from |
1 |
First row to read, 1-indexed and inclusive |
to |
NULL |
Last row to read; NULL reads to the final row |
as.data.table |
FALSE |
If TRUE, return a data.table instead of a data.frame |
The columns, from, and to arguments are the reason to choose .fst over .rds. They let you read a subset of a large file straight from disk, instead of loading everything into memory and then subsetting.
Reading .fst files: four common cases
Start by writing a sample .fst file so the reads have something to load. The block below builds a 10,000-row data frame and saves it with write_fst(). Every later example reads this file back.
Case 1: read the entire file. Pass only the path to load every row and column back into a data frame.
Case 2: read selected columns. Pass a character vector to columns. fst decompresses only those columns, which is faster and uses less memory than loading the full frame.
columns = c(...) skips decompression of everything else, so the read scales with what you ask for, not with the file size.Case 3: read a range of rows. Use from and to to pull a window of rows. Both bounds are inclusive, so from = 1, to = 500 returns exactly 500 rows.
Case 4: inspect metadata, or return a data.table. metadata_fst() reads the header only, with no data load, so you can check the shape of a file before committing memory to it. Set as.data.table = TRUE when you want a data.table back.
read_fst() vs other ways to load data
read_fst() wins when you reload large data frames repeatedly. It is faster than readRDS() and far faster than read_csv(), and unlike either it supports partial reads. The trade-off is that .fst is an R-focused binary format, so it is a session cache, not an interchange format.
| Function | File format | Read speed | Partial read |
|---|---|---|---|
read_fst() |
.fst binary |
Fastest | Yes, columns and rows |
readRDS() |
.rds binary |
Fast | No, loads the whole object |
read_csv() |
.csv text |
Slow | No |
read_parquet() |
.parquet binary |
Fast | Yes, columns |
Use read_fst() for an R-only workflow that re-reads big tables often. Use read_parquet() when the same data must move between R, Python, and other tools.
.fst reader in mainstream pandas. The cross-language equivalent of a fast binary table is Parquet or Feather, both read in R with arrow::read_parquet() and in Python with pandas.read_parquet().Common pitfalls
Column names passed to read_fst() are case-sensitive. A typo or wrong case raises an error instead of silently returning nothing. Match the names exactly as metadata_fst() reports them.
The from and to bounds are inclusive. Reading from = 101, to = 200 returns 100 rows, not 99. Treat the window like 101:200, not like a zero-based slice.
Try it yourself
Try it: Read only the units column from sales.fst for rows 1 through 100. Save the result to ex_units.
Click to reveal solution
Explanation: Passing columns, from, and to together reads one column across a 100-row window. fst touches only that slice of the file, so the read stays fast even on a large dataset.
Related fst and data import functions
read_fst() is one of a small, focused family in the fst package. These functions cover writing, inspecting, and tuning .fst input and output.
write_fst(): create a.fstfile from a data frame, with an adjustable compression level.metadata_fst(): read a file's column names, types, and row count without loading data.threads_fst(): set how many threads fst uses for compression and decompression.readRDS(): read a single R object from an.rdsfile.arrow::read_parquet(): read a Parquet file when data must cross languages.
See the official fst package reference for the full argument list and benchmarks.
FAQ
What is a .fst file in R?
A .fst file is a compressed, columnar binary file that stores an R data frame on disk. It is created by write_fst() from the fst package. The format records column names, types, and row count in a header, and stores each column as a separately compressed block. That layout lets read_fst() load specific columns or row ranges without reading the whole file, which makes it well suited to caching large tables between R sessions.
Is read_fst() faster than read.csv()?
Yes, by a wide margin on large data. read.csv() parses text line by line and converts every value from a string, which is CPU-bound. read_fst() reads typed binary blocks, so the work is mostly decompression and a memory copy. For multi-million-row tables the difference is often more than ten-fold, and read_fst() can also read just the columns you need, which read.csv() cannot.
Can read_fst() read part of a file?
Yes. Pass columns to read a subset of columns and from plus to to read a range of rows. Both bounds are 1-indexed and inclusive. Because .fst is columnar and indexed, these partial reads pull only the requested blocks off disk rather than loading the full file and subsetting in memory.
Does read_fst() return a data.table or a data.frame?
By default read_fst() returns a plain data.frame. Set as.data.table = TRUE to get a data.table instead, which is useful when the rest of your pipeline uses data.table syntax. The underlying data is identical either way; only the class and the available methods change.