Import Any Data Format Into R: CSV, Excel, JSON, and 12 Others
Almost every real R project starts with "read this file and give me a data frame." This guide shows you the one function to reach for by format, CSV, Excel, JSON, TSV, fixed-width, SPSS, Stata, SAS, Parquet, RDS, and more, plus the arguments that rescue broken imports.
How do you import a CSV file into R?
CSV is the workhorse format, flat, text-based, universal. R gives you three main choices: base read.csv(), tidyverse readr::read_csv(), and data.table::fread(). All three return a data frame; they differ in speed, defaults, and output class.
read_csv() auto-detects column types, parses dates, returns a tibble, and is about 10× faster than base read.csv(). It also prints a column-spec summary so you can catch type surprises early.
read_csv() for most work, fread() when speed matters on huge files (100 MB+), and base read.csv() only when you can't install packages. The differences are in defaults, not capability.Try it: Load a CSV with the readr package and check its column types with spec().
Click to reveal solution
read_csv() accepts a literal string as input when it contains a newline, so you can test parsing without writing a file. spec() returns the column specification read_csv() inferred, use it to verify types before pointing at a large file.
What arguments fix broken CSV imports?
Real-world CSVs are rarely clean. These five arguments solve 90% of import headaches.
The core arguments: delim, locale, skip, col_types, na. Memorize these and you'll handle 99% of real CSVs without resorting to text editors.
col_types uses a compact string syntax: "icldDTc" means integer, character, logical, double, Date, POSIXct, character. Or pass cols(x = col_double(), y = col_character()) for named columns.Try it: Parse a CSV where missing values are encoded as "NA", "NULL", and "-".
Click to reveal solution
Passing a character vector to na tells read_csv() to treat every one of those tokens as a missing value. Because all three sentinels resolve to NA, the score column ends up as a clean numeric, without the na argument the column would have been read as character.
How do you read Excel files?
Excel files (.xlsx, .xls) need the readxl package, installed with tidyverse, but you call it directly. It handles multi-sheet workbooks and preserves cell types better than any CSV export would.
No external dependencies (no Java, no libreoffice), readxl is pure C++ under the hood, so it works the same on Mac, Linux, and Windows.
read_excel() catches this for properly-formatted cells, but if dates arrive as numbers like 44562, convert with as.Date(44562, origin = "1899-12-30").Try it: List the sheets in a hypothetical workbook with excel_sheets().
Click to reveal solution
excel_sheets() opens the workbook and returns a character vector of sheet names in their stored order, call this before read_excel() whenever you don't control the file, so you can pass the right sheet = argument instead of guessing.
How do you import JSON into R?
The jsonlite package parses JSON into a data frame when the shape is tabular, or a nested list when it isn't. It's fast and handles both local files and API responses.
When the JSON is nested, fromJSON() returns a list of lists/data frames that you navigate with $. Pass flatten = TRUE to unnest embedded objects into columns automatically.
Try it: Parse a tiny JSON array of three objects and check the class of the result.
Click to reveal solution
fromJSON() auto-simplifies a JSON array of flat objects into a data.frame because every element shares the same keys. If any object had a different shape or a nested value, the result would fall back to a list, which is why you should always class() the result before piping it downstream.
How do you read data from other statistical software (SPSS, Stata, SAS)?
Migrants from other stats software can keep their existing files. The haven package reads (and writes) SPSS .sav, Stata .dta, and SAS .sas7bdat files directly.
haven preserves value labels, variable labels, and missing-value markers, which is critical when you're collaborating with SPSS or Stata users. Use labelled::to_factor() to convert labelled columns to regular R factors when you need them.
.xls, fixed-width, and Matlab .mat files, use readxl::read_xls(), readr::read_fwf(), and R.matlab::readMat() respectively. Each format has a dedicated package; the tidyverse ecosystem keeps them consistent.Try it: Check what class read_sav() returns (conceptually, it's a tibble with haven_labelled columns).
Click to reveal solution
read_sav() returns a tibble (tbl_df) so dplyr verbs work directly on it. Individual columns that carried SPSS value labels gain an extra haven_labelled class, check them with class(df$col) and convert with haven::as_factor() when you need plain R factors.
How do you handle big files with data.table::fread()?
When your file is hundreds of megabytes or you're iterating many times, fread() from the data.table package is the fastest tool in R. It auto-detects delimiters, types, and headers with a single call.
fread() returns a data.table (a high-performance subclass of data frame). If you want a plain data frame or tibble, wrap the call: as.data.frame(fread(...)) or tibble::as_tibble(fread(...)).
For files over 1 GB, fread() will often be 5-10× faster than read_csv() and use less memory.
Try it: Use fread() on inline text with text = ....
Click to reveal solution
The text = argument lets fread() parse a literal string just like it would a file path, which is the fastest way to prototype a call without touching disk. The result is a data.table, the leading 1:, 2:, 3: on each row are the data.table row index, not a real column.
How do you save and load R-native formats (RDS, RData)?
When the source of data is another R session, use R-native formats. saveRDS()/readRDS() save one object and let the caller name it on load. save()/load() save multiple named objects and restore them under their original names.
readRDS() is the safer pattern, you control what variable the object gets bound to, so there's no risk of silently overwriting something in your workspace. Prefer it for any single-object serialization.
arrow::write_parquet() and arrow::read_parquet() are worth learning. Parquet is columnar, compressed, and readable from Python, Spark, and most data tools, making it the best "exchange format" for modern data work.Try it: Save a small vector to an RDS file and read it back.
Click to reveal solution
Capturing the tempfile() path in a variable is the key move, otherwise you write the RDS to one random path and try to read from a different one. saveRDS() serializes exactly one object and readRDS() returns it so the caller can bind it to any variable name they like.
Practice Exercises
Exercise 1: Multi-sheet Excel
Given a workbook with sheets "Q1", "Q2", "Q3", "Q4", read all four sheets and combine into one data frame with a quarter column.
Show solution
Exercise 2: Dirty CSV rescue
A CSV has 3 junk rows, semicolon delimiters, comma decimals, and "NULL" as missing. Read it correctly.
Show solution
Exercise 3: JSON to data frame
Parse this JSON into a data frame with columns id, name, tags (a list-column).
Show solution
Putting It All Together
A realistic import pipeline: detect the file extension, route to the right reader, and return a tibble.
One function, six formats, zero manual branching at the call site. This is the kind of small utility that pays for itself the first week.
Summary
| Format | Function | Package |
|---|---|---|
| CSV | read_csv() |
readr |
| TSV | read_tsv() |
readr |
| Custom delimiter | read_delim() |
readr |
| Excel (.xlsx, .xls) | read_excel() |
readxl |
| JSON | fromJSON() |
jsonlite |
| SPSS | read_sav() |
haven |
| Stata | read_dta() |
haven |
| SAS | read_sas() |
haven |
| Fixed-width | read_fwf() |
readr |
| Big CSV (fast) | fread() |
data.table |
| R single object | readRDS() |
base |
| R multi-object | load() |
base |
| Parquet | read_parquet() |
arrow |
References
- readr package documentation, modern CSV/TSV/fwf reader
- readxl package documentation, Excel without Java
- jsonlite documentation, JSON parsing
- haven package documentation, SPSS/Stata/SAS
- data.table::fread, fastest CSV reader
Continue Learning
- R Data Frames: Every Operation You'll Need, what to do with the data after importing.
- dplyr filter() and select(), subset your imported data.
- R Data Types: Which Type Is Your Variable?, understand the column types readers produce.