haven read_sav() in R: Import SPSS .sav Files
The haven read_sav() function imports an SPSS .sav file into an R tibble. It needs no SPSS installation and pulls in the variable labels and value labels stored inside the file.
read_sav("data.sav") # read an SPSS .sav file
read_sav("data.zsav") # read a compressed .zsav file
read_sav("data.sav", col_select = c(id, age)) # read only some columns
read_sav("data.sav", n_max = 1000) # read the first 1000 rows
read_sav("data.sav", user_na = TRUE) # keep SPSS user missing values
read_spss("data.por") # read an older SPSS portable file
write_sav(df, "data.sav") # write a tibble back to SPSSNeed explanation? Read on for examples and pitfalls.
What read_sav() does
read_sav() turns an SPSS .sav file into a tibble. You pass it a file path and it returns a tidy data frame with one column per SPSS variable. The function reads the binary SPSS format directly through the bundled ReadStat C library, so no SPSS license or software is needed on the machine.
Unlike a CSV export, an SPSS file carries a full codebook. Variable labels, value labels such as 1 = "Male", and user-defined missing values all live inside the single .sav file. read_sav() reads that metadata in the same call and attaches it to the tibble, so the survey context travels with the data.
Syntax and key arguments
Most calls need only the file path; the other arguments control which rows and columns you read and how missing values arrive. SPSS survey files can be both wide and long, so these arguments let you pull a slice instead of the whole table.
The arguments you reach for most are col_select (keep only the variables you need), n_max (preview a large file fast), and user_na. The user_na argument is SPSS specific: leave it FALSE and read_sav() converts SPSS user-defined missing values to plain NA, while TRUE keeps them as tagged missing values you can tell apart later.
.sav reader. The closest equivalent is pyreadstat.read_sav(), which, like haven, is built on the ReadStat C library, so both return the same values and labels from a given file.read_sav() examples
haven ships an example iris.sav file, so every example below runs without an SPSS file of your own. Build the path with system.file() and pass it to read_sav().
The result is a tibble. SPSS variable names cannot contain dots, so Sepal.Length from base R is stored as Sepal_Length here.
Read only the columns you need with col_select. It uses tidyselect syntax, so you name columns without quotes. This trims a wide SPSS export down to the variables your analysis touches.
Preview a large file with n_max. Reading a capped number of rows returns a sample in milliseconds, even when the full file holds hundreds of thousands of cases.
Confirm the object once the file is in memory. read_sav() always returns a tibble, which prints cleanly and works with every tidyverse function.
.zsav. read_sav() reads SPSS's compressed .zsav format with no extra arguments, and write_sav(df, "data.zsav", compress = TRUE) produces one. A .zsav file is often a third the size of the same data stored as plain .sav.read_sav() vs read_spss(), read_por() and other readers
read_sav() is one of several haven readers, each tied to a statistical file format. They share the same return type and most arguments, so picking the right one is mostly a matter of matching the file extension.
| Function | Reads | Source software |
|---|---|---|
read_sav() |
.sav, .zsav |
SPSS |
read_por() |
.por |
SPSS legacy portable |
read_spss() |
.sav, .zsav, .por |
SPSS, auto-dispatch |
read_sas() |
.sas7bdat |
SAS |
read_dta() |
.dta |
Stata |
Use read_sav() for modern SPSS data files, which end in .sav or the compressed .zsav. Use read_por() for the older SPSS portable format, which ends in .por. If you would rather not think about the extension at all, read_spss() inspects the file and dispatches to the correct reader for you.
Common pitfalls
Pointing read_sav() at a path that does not exist. The function cannot guess where the file is. A typo in the path or the wrong working directory triggers an immediate error.
Letting user-defined missing values vanish silently. SPSS lets analysts flag codes such as 99 as missing. With the default user_na = FALSE, read_sav() turns every such value into NA, and you lose the reason a value was missing. Read with user_na = TRUE whenever that distinction matters.
Coded columns arriving as labelled values. When a .sav file stores value labels, read_sav() returns those columns with class haven_labelled. They look like numbers but carry hidden labels, which breaks some functions. Convert them with as_factor() for readable factors or zap_labels() to keep the raw values.
Date or POSIXct values, but a date column saved without that format reads as a large raw number. Check any suspicious column and convert it with as.Date().Try it yourself
Try it: Read the bundled iris.sav file, then compute the mean of the Petal_Width column. Save the result to ex_mean.
Click to reveal solution
Explanation: read_sav() returns the SPSS file as a tibble, so ex_data$Petal_Width is an ordinary numeric column that mean() summarises directly.
Related haven functions
read_sav() works alongside a small set of haven readers, writers, and label helpers. Reach for the one that matches your file or your cleanup task.
read_por(): read a legacy SPSS portable.porfile into a tibble.read_spss(): read any SPSS file and auto-detect whether it is.savor.por.write_sav(): write a tibble back out as an SPSS.savor.zsavfile.read_sas(): read a SAS.sas7bdatdata file into a tibble.read_dta(): read a Stata.dtafile into a tibble.as_factor(): turn SPSS labelled columns into readable factors.zap_labels(): strip value labels and keep the raw underlying values.
For the full argument reference, see the haven read_sav() documentation on tidyverse.org.
FAQ
How do I import an SPSS file into R?
Install the haven package with install.packages("haven"), load it with library(haven), then call read_sav("path/to/file.sav"). The function reads the binary SPSS format directly and returns the data as a tibble. You do not need SPSS itself, a license, or any IBM software on the machine. The path can be absolute or relative to your current working directory.
What package reads .sav files in R?
The haven package is the standard choice. It reads SPSS .sav and compressed .zsav files with read_sav(), and it is part of the tidyverse. The older foreign::read.spss() still works, but haven is faster, actively maintained, returns a tidy tibble, and preserves value labels and user-defined missing values that foreign handles less cleanly.
What is the difference between read_sav() and read_spss()?
read_sav() reads modern SPSS data files with the .sav or .zsav extension. read_spss() is a wrapper that inspects the file and dispatches to read_sav() for .sav files or read_por() for older .por portable files. If you know your extension, call the specific reader. If you want one function for any SPSS file, use read_spss().
Why are my SPSS columns showing as labelled in R?
The .sav file stored value labels, so read_sav() returns those columns with class haven_labelled. They display as numbers with attached labels. Call as_factor() on the data to convert labelled columns into factors, or zap_labels() to drop the labels and keep the raw values. This is common when importing coded survey responses.
How do I handle SPSS missing values in R?
SPSS supports user-defined missing values, where specific codes are flagged as missing. By default read_sav() uses user_na = FALSE and converts them all to plain NA. Set user_na = TRUE to keep them as tagged missing values, then inspect them later or strip them when ready. Use TRUE whenever the reason a value is missing matters to your analysis.