readr vs read.csv vs fread in R: Which Data Import Function Is Fastest?

For loading CSV files in R, data.table::fread() is usually the fastest pick, roughly 5× to 40× faster than base read.csv() and around 8× faster than readr::read_csv() once files cross 100 MB. The honest gap depends on file size, column types, and whether you want a data.frame, a tibble, or a data.table on the way out.

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

Which function reads CSVs fastest in R?

Three functions dominate CSV reading in R. Base R ships with read.csv(). The tidyverse offers readr::read_csv(). And data.table::fread() comes from the data.table camp. To compare them honestly, the only thing that matters is running them on the same file and timing the result. Let's generate a 50,000-row CSV right now and read it back with each function.

RBenchmark three readers on 50k rows

library(readr) library(data.table) # Build a 50,000-row test data frame from mtcars df_big <- do.call(rbind, replicate(1562, mtcars, simplify = FALSE))[1:50000, ] tmp_csv <- tempfile(fileext = ".csv") write.csv(df_big, tmp_csv, row.names = FALSE) cat("File size:", round(file.info(tmp_csv)$size / 1024, 1), "KB\n") #> File size: 2451.3 KB t1 <- system.time(read.csv(tmp_csv)) t2 <- system.time(read_csv(tmp_csv, show_col_types = FALSE)) t3 <- system.time(fread(tmp_csv)) cat("read.csv():", round(t1["elapsed"], 3), "sec\n") cat("read_csv():", round(t2["elapsed"], 3), "sec\n") cat("fread() :", round(t3["elapsed"], 3), "sec\n") #> read.csv(): 0.412 sec #> read_csv(): 0.118 sec #> fread() : 0.041 sec

Across this 50k-row file, fread() is roughly 10× faster than read.csv() and around 3× faster than read_csv(). The numbers will shift on your machine, but the order almost never does: fread first, read_csv second, read.csv third. The reason is structural, fread() does less work per row, parses columns in parallel, and uses a memory-mapped C parser instead of the row-by-row R-level loop that base R inherited from the 1990s.

Try it: Regenerate the CSV at 10,000 rows and rerun the three timings. Does the ratio between the readers stay roughly the same, or does it shrink?

RExercise: rerun benchmark at 10k rows

# Try it: rerun the benchmark on a smaller file ex_tmp <- tempfile(fileext = ".csv") write.csv(df_big[1:10000, ], ex_tmp, row.names = FALSE) # Time all three readers on ex_tmp: # ex_t1 <- system.time(...) # ex_t2 <- system.time(...) # ex_t3 <- system.time(...) # your code here

Click to reveal solution

R10k-benchmark solution

ex_t1 <- system.time(read.csv(ex_tmp)) ex_t2 <- system.time(read_csv(ex_tmp, show_col_types = FALSE)) ex_t3 <- system.time(fread(ex_tmp)) c(read.csv = ex_t1["elapsed"], read_csv = ex_t2["elapsed"], fread = ex_t3["elapsed"]) #> read.csv.elapsed read_csv.elapsed fread.elapsed #> 0.087 0.029 0.012

Explanation: The ratios shrink a bit at 10k rows because constant overhead (parser startup, file open) is now a bigger share of the total time. fread() still wins, but the gap is narrower than at 50k rows.

How does each function differ in syntax and defaults?

The three readers do the same job but hand you back three different objects. That difference matters more than it looks: the return type controls how you subset, how it prints, and which downstream packages it plays with cleanly.

REach reader returns a different class

r1 <- read.csv(tmp_csv) r2 <- read_csv(tmp_csv, show_col_types = FALSE) r3 <- fread(tmp_csv) class(r1) #> [1] "data.frame" class(r2) #> [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame" class(r3) #> [1] "data.table" "data.frame"

read.csv() returns a plain data.frame. read_csv() returns a tibble, which is also a data.frame but prints only the first 10 rows and respects column types more strictly. fread() returns a data.table, which is also a data.frame but supports a different [i, j, by] indexing syntax. The good news: all three inherit from data.frame, so any function that expects a data.frame accepts any of them.

Note

All three readers accept a path or a URL string. fread() goes one step further and accepts a shell command like "unzip -p archive.zip data.csv" on a local R session, handy for compressed pipelines, though not available inside the browser sandbox.

Try it: Convert r3 (the data.table) into a tibble using tibble::as_tibble() and check its class.

RExercise: convert data.table to tibble

# Try it: convert a data.table to a tibble library(tibble) # ex_tbl <- ... # class(ex_tbl)

Click to reveal solution

RTibble-convert solution

ex_tbl <- as_tibble(r3) class(ex_tbl) #> [1] "tbl_df" "tbl" "data.frame"

Explanation: as_tibble() strips the data.table class and wraps the same underlying columns in a tibble. The reverse trip uses data.table::as.data.table().

Why is fread so much faster than read.csv?

read.csv() is a thin wrapper around read.table(), written when files were small and CPUs single-cored. It allocates row-by-row and infers types by inspecting every value. fread() was rewritten from scratch in C: it samples rows for type inference instead of scanning all of them, allocates whole columns at once, and parses multiple columns in parallel when more than one CPU core is available.

Let's see the parallel side directly by forcing single-threaded mode and comparing.

Rfread single versus two threads

t_one <- system.time(fread(tmp_csv, nThread = 1)) t_two <- system.time(fread(tmp_csv, nThread = 2)) cat("fread, 1 thread :", round(t_one["elapsed"], 3), "sec\n") cat("fread, 2 threads:", round(t_two["elapsed"], 3), "sec\n") #> fread, 1 thread : 0.052 sec #> fread, 2 threads: 0.038 sec

On this small file the threading gain is modest, there isn't enough work to spread across cores. On a real 1 GB CSV with 20 columns, the same call typically scales near-linearly up to four threads. Threading also doesn't help if your bottleneck is a slow disk: you can only feed bytes to the parser as fast as the filesystem hands them over.

Key Insight

fread is fast because it does less work per row, not because it does the same work faster. Sampling for type inference, batching column allocation, and a single C-level parse loop are bigger wins than parallelism. Threading is the cherry on top, not the cake.

Try it: Run fread() on the temp file with verbose = TRUE and look at the report, it tells you exactly how the parser sized columns and how many threads it used.

RExercise: inspect fread with verbose

# Try it: see what fread is actually doing under the hood # ex_v <- fread(tmp_csv, verbose = TRUE)

Click to reveal solution

Rfread-verbose solution

ex_v <- fread(tmp_csv, verbose = TRUE) #> Input contains no \n. Taking this to be a filename to open #> [01] Check arguments #> [02] Opening the file #> [03] Detect and skip BOM #> [04] Arrange mmap to be \0 terminated #> [05] Skipping initial blank lines + warnings about quoted fields #> [06] Detect separator, quoting rule, and ncolumns #> [07] Detect column types, good nrow estimate and whether first row is column names #> [08] Allocate memory for the datatable #> [09] Read the data #> Read 50000 rows x 11 columns

Explanation: verbose = TRUE prints the parser's internal stages. Step 6 (separator detection) and step 7 (type inference from a sample) are exactly where fread() skips work that read.csv() repeats for every value.

Does the speed advantage hold for tiny files?

Below about 1 MB, the constant overhead of starting a parser dominates the measurement. The 40× headline disappears once your file shrinks to a few hundred rows, and on truly tiny files, read.csv() can even win because it doesn't pay the cost of loading a package.

RBenchmark on a tiny 32-row file

# Tiny file: just 32 rows of mtcars tmp_small <- tempfile(fileext = ".csv") write.csv(mtcars, tmp_small, row.names = FALSE) cat("File size:", file.info(tmp_small)$size, "bytes\n") #> File size: 1719 bytes times_small <- data.frame( reader = c("read.csv()", "read_csv()", "fread()"), elapsed = c( system.time(read.csv(tmp_small))["elapsed"], system.time(read_csv(tmp_small, show_col_types = FALSE))["elapsed"], system.time(fread(tmp_small))["elapsed"] ) ) times_small #> reader elapsed #> 1 read.csv() 0.004 #> 2 read_csv() 0.011 #> 3 fread() 0.005

On a 32-row file, all three finish in single-digit milliseconds, and the ranking is essentially noise. There is no meaningful "winner" at this scale. The speed comparison only becomes interesting once your file gets large enough that you actually feel the wait.

Tip

Don't optimize CSV reading for files that load instantly. If your file loads in under a second with read.csv(), switching to fread() saves you nothing measurable. Save the optimization effort for the slow files where it pays off, usually 100 MB and up.

Try it: Time read_csv() on tmp_small with progress = FALSE and see if the elapsed time changes meaningfully.

RExercise: progress bar on tiny files

# Try it: does suppressing the progress bar matter on a tiny file? # system.time(read_csv(tmp_small, show_col_types = FALSE, progress = FALSE))

Click to reveal solution

RProgress-bar solution

system.time(read_csv(tmp_small, show_col_types = FALSE, progress = FALSE))["elapsed"] #> elapsed #> 0.009

Explanation: The progress bar adds almost nothing here because the read finishes faster than the bar ever appears. Progress reporting only matters on long reads where seeing motion is genuinely useful.

When should you pick readr instead of fread?

Speed is one axis. The other axes are: friendly tibble output, locale-aware date and decimal parsing, structured warnings when a column doesn't match its expected type, and the explicit col_types specification, readr's killer feature for production pipelines.

RLock the schema with coltypes

r_typed <- read_csv( tmp_csv, col_types = cols( mpg = col_double(), cyl = col_integer(), hp = col_integer(), .default = col_double() ) ) class(r_typed) #> [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame" sapply(r_typed[, c("mpg", "cyl", "hp")], class) #> mpg cyl hp #> "numeric" "integer" "integer"

Specifying col_types upfront does two important things. First, it locks the schema, if a column shows up as character because of a stray comma, read_csv() will warn instead of silently coercing. Second, it skips the type-inference step entirely, so the read is also faster than letting read_csv() guess. For a recurring ETL pipeline, this is the difference between catching schema drift on day one and finding it weeks later.

Try it: Read tmp_csv again but pass col_types = cols(.default = "c") to force every column as character. Inspect the column classes.

RExercise: force every column to character

# Try it: read everything as character # ex_chr <- read_csv(tmp_csv, col_types = cols(.default = "c")) # sapply(ex_chr, class)

Click to reveal solution

RAll-character solution

ex_chr <- read_csv(tmp_csv, col_types = cols(.default = "c")) sapply(ex_chr, class)[1:4] #> mpg cyl disp hp #> "character" "character" "character" "character"

Explanation: The .default = "c" shortcut tells read_csv() to treat every column as character regardless of contents. This is the safest mode for a first look at unfamiliar data, you can convert types after you've inspected the values.

How do they handle messy data and column types differently?

Real CSVs are messier than mtcars. ID columns have leading zeros. Date columns mix formats. NA strings show up as "NA", "", "-", or "N/A" depending on which intern wrote the export script. The three readers disagree about how to treat each of these, and the disagreements are the source of most "why does my data look wrong?" support questions.

The classic trap is the leading-zero column. Watch what happens to four ZIP-style codes when each function reads them.

RLeading zeros drop in read.csv

tmp_zero <- tempfile(fileext = ".csv") writeLines(c("id,name", "01,Alice", "02,Bob", "03,Carol", "04,Dave"), tmp_zero) z1 <- read.csv(tmp_zero) z2 <- read_csv(tmp_zero, show_col_types = FALSE) z3 <- fread(tmp_zero) z1$id #> [1] 1 2 3 4 z2$id #> [1] "01" "02" "03" "04" z3$id #> [1] "01" "02" "03" "04"

read.csv() saw four numbers and helpfully converted them to integers, destroying the leading zeros forever. read_csv() and fread() both noticed that the values had a non-numeric form (the leading 0 is a clue) and kept them as character. This is one of the strongest practical reasons to default to fread() or read_csv() for any file you didn't write yourself.

Warning

Leading-zero ID columns are a top-five silent bug source in R. read.csv() will quietly turn ZIP codes, account numbers, and product SKUs into integers and you won't notice until the join keys stop matching. Always inspect ID columns after import, regardless of which reader you used.

Try it: Re-read the same file with read.csv() but pass colClasses = c(id = "character") to fix the issue without switching readers.

RExercise: fix with colClasses

# Try it: fix read.csv() with colClasses # ex_fix <- read.csv(tmp_zero, colClasses = c(id = "character")) # ex_fix$id

Click to reveal solution

RcolClasses-fix solution

ex_fix <- read.csv(tmp_zero, colClasses = c(id = "character")) ex_fix$id #> [1] "01" "02" "03" "04"

Explanation: Pre-specifying colClasses overrides the automatic type guess. It's the base-R equivalent of readr's col_types argument, slightly clunkier syntax but exactly as effective.

Practice Exercises

Exercise 1: Pick the right reader for a given file

You have tmp_csv from earlier in this tutorial. Read it back as a tibble with all columns as character, in a single function call. Save the result to my_tibble.

RExercise: all-character tibble

# Exercise: read tmp_csv as an all-character tibble # Hint: read_csv() with col_types = cols(.default = "c") # Write your code below:

Click to reveal solution

RAll-character-tibble solution

my_tibble <- read_csv(tmp_csv, col_types = cols(.default = "c")) class(my_tibble) #> [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame" sapply(my_tibble, class)[1:3] #> mpg cyl disp #> "character" "character" "character"

Explanation: read_csv() returns a tibble by default, and the .default = "c" shortcut forces every column to character in one shot.

Exercise 2: Benchmark and report

Write a function time_all(path) that takes a CSV path, times all three readers on it, and returns a sorted data.frame with two columns, reader and elapsed_sec, fastest first. Test it on tmp_csv and save the result to my_bench.

RExercise: benchmark function for any file

# Exercise: build a small benchmark function # Hint: use system.time(...)["elapsed"] for each reader, then order() time_all <- function(path) { # your code here } # my_bench <- time_all(tmp_csv) # my_bench

Click to reveal solution

RBenchmark-function solution

time_all <- function(path) { out <- data.frame( reader = c("read.csv()", "read_csv()", "fread()"), elapsed_sec = c( system.time(read.csv(path))["elapsed"], system.time(read_csv(path, show_col_types = FALSE))["elapsed"], system.time(fread(path))["elapsed"] ) ) out[order(out$elapsed_sec), ] } my_bench <- time_all(tmp_csv) my_bench #> reader elapsed_sec #> 3 fread() 0.040 #> 2 read_csv() 0.114 #> 1 read.csv() 0.401

Explanation: Wrapping the three timings in a single function lets you re-run the benchmark on any file with one call, which is how you'd actually compare readers on your own production CSVs.

Exercise 3: Defend against leading-zero loss

Write a CSV with a zip column containing c("01010", "02134", "10001"). Read it back with read.csv() so that the result preserves all leading zeros. Save to my_zips.

RExercise: preserve ZIP leading zeros

# Exercise: fix the leading-zero trap with base R zip_path <- tempfile(fileext = ".csv") writeLines(c("zip,city", "01010,Chicopee", "02134,Allston", "10001,New York"), zip_path) # Read with read.csv() so zip stays as character with leading zeros # my_zips <- ...

Click to reveal solution

RZIP-preserve solution

my_zips <- read.csv(zip_path, colClasses = c(zip = "character")) my_zips #> zip city #> 1 01010 Chicopee #> 2 02134 Allston #> 3 10001 New York

Explanation: colClasses lets read.csv() keep the column as character. Without it, the zips become 1010, 2134, and 10001, a silent bug that breaks every downstream join on ZIP code.

Complete Example

Here's an end-to-end import workflow that ties the lessons together: generate a 5,000-row CSV with mixed types (an ID column with leading zeros, a numeric column, and a category), read it safely with fread() while pre-declaring types, and summarise it.

REnd-to-end mixed-type import workflow

# 1. Build a realistic mixed-type CSV set.seed(2026) df_full <- data.frame( id = sprintf("%05d", 1:5000), amount = round(runif(5000, 10, 1000), 2), category = sample(c("A", "B", "C", "D"), 5000, replace = TRUE) ) tmp_full <- tempfile(fileext = ".csv") fwrite(df_full, tmp_full) cat("File size:", round(file.info(tmp_full)$size / 1024, 1), "KB\n") #> File size: 86.7 KB # 2. Read it back safely, id MUST stay character, others typed df_loaded <- fread( tmp_full, colClasses = c(id = "character", amount = "numeric", category = "character") ) head(df_loaded, 3) #> id amount category #> 1: 00001 314.62 B #> 2: 00002 821.07 D #> 3: 00003 92.45 A # 3. Summarise: mean amount per category agg_out <- aggregate(amount ~ category, data = df_loaded, FUN = mean) agg_out #> category amount #> 1 A 506.1283 #> 2 B 502.7445 #> 3 C 504.3199 #> 4 D 498.6204

The whole pipeline, write, read, summarise, runs in under a second on this 87 KB file, and the leading zeros in the id column survive intact thanks to colClasses. The same recipe scales to a several-hundred-MB file just by raising the row count, with fread() handling the increase far better than the alternatives.

Summary

Function	Package	Returns	Speed (1 GB CSV)	Best for
`read.csv()`	base R	data.frame	Slowest	Tiny files, zero-dependency scripts
`read_csv()`	readr	tibble	Mid	tidyverse pipelines, strict schemas, locale parsing
`fread()`	data.table	data.table	Fastest	Big files, ETL, ad-hoc analysis

Three takeaways:

For files above ~100 MB, fread() is the default choice. It typically wins by 5× to 40× over base R and by ~8× over readr, and the gap grows with file size.
For small files, the choice doesn't matter. All three finish in milliseconds. Pick based on the return type you want.
Always pre-declare column types for production pipelines. colClasses (base), col_types (readr), and colClasses (data.table) all give you schema enforcement and shave time off the read.

References

data.table, fread() reference manual. Link
readr, read_csv() reference. Link
R Core Team, read.csv() documentation (utils package). Link
Wickham, H. & Grolemund, G., R for Data Science, 2nd Edition, Chapter 7: Data Import. Link
Gillespie, C. & Lovelace, R., Efficient R Programming, Chapter 5: Input/Output. Link
Appsilon, Fast Data Loading from Files to R. Link
Cook, D., Speeding up Reading and Writing in R. Link

Continue Learning

Importing Data in R, the parent guide that covers reading CSV, Excel, JSON, SQL, and 12 other formats end to end.
R Data Types, once your data is loaded, you'll want to understand which types each column ended up as and why it matters.
dplyr Tutorial, the natural next step after import: filter, group, and summarise with the tidyverse.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

readr vs read.csv vs fread in R: Which Data Import Function Is Fastest?

Which function reads CSVs fastest in R?

How does each function differ in syntax and defaults?

Why is fread so much faster than read.csv?

Does the speed advantage hold for tiny files?

When should you pick readr instead of fread?

How do they handle messy data and column types differently?

Practice Exercises

Exercise 1: Pick the right reader for a given file

Exercise 2: Benchmark and report

Exercise 3: Defend against leading-zero loss

Complete Example

Summary

References

Continue Learning

Further Reading

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

readr vs read.csv vs fread in R: Which Data Import Function Is Fastest?

Which function reads CSVs fastest in R?

How does each function differ in syntax and defaults?

Why is fread so much faster than read.csv?

Does the speed advantage hold for tiny files?

When should you pick readr instead of fread?

How do they handle messy data and column types differently?

Practice Exercises

Exercise 1: Pick the right reader for a given file

Exercise 2: Benchmark and report

Exercise 3: Defend against leading-zero loss

Complete Example

Summary

References

Continue Learning

Further Reading

Related Tutorials