haven labelled() in R: Create Labelled Survey Vectors
The haven labelled() function builds a labelled vector in R: it stores numeric or character codes alongside the value labels that explain them, mirroring how SPSS, Stata, and SAS keep coded survey data.
labelled(x, c(No = 0, Yes = 1)) # numeric value labels labelled(x, c(M = "m", F = "f")) # character value labels labelled(x, labels, label = "Question 5") # add a variable label labelled_spss(x, labels, na_values = 9) # SPSS user missing values is.labelled(x) # test for the labelled class print_labels(x) # show the value-label map as_factor(x) # convert to an ordinary factor zap_labels(x) # strip labels, keep raw codes
Need explanation? Read on for examples and pitfalls.
What labelled() does
labelled() pairs raw codes with human-readable labels. Survey data is usually stored as numbers: a gender column holds 1 and 2, not Male and Female. The mapping lives in a separate codebook. labelled() attaches that codebook directly to the vector, so the codes and their meaning travel together as a single object of class haven_labelled.
This is the same structure read_sav(), read_dta(), and read_sas() produce when they import labelled files. Calling labelled() by hand lets you recreate that structure for data you build in R, or for a column you need to recode before exporting back to SPSS or Stata with write_sav().
Syntax and key arguments
labelled() takes a vector and a named labels lookup. Only x and labels are needed for most calls; label adds a description of the whole variable.
The labels argument is the part that trips people up. It is a named vector where the names become the labels and the values become the data codes. So c(Male = 1, Female = 2) means code 1 displays as Male. The names and values must be the same type as x: numeric labels for a numeric vector, character labels for a character vector.
Creating a labelled vector
Start with a numeric vector and a labels lookup. Load haven, then pass the codes and their meaning to labelled().
The vector still prints its raw codes, with the label map shown underneath. The label argument adds a description of the variable as a whole, separate from the per-value labels.
Labelling character data
Character vectors can be labelled too. The codes are strings, and the labels in labels must be strings to match.
SPSS-style user missing values
Use labelled_spss() when codes double as missing values. It extends labelled() with na_values and na_range, the user-defined missing codes SPSS uses for answers like "Refused" or "Not applicable".
labelled() vs labelled_spss() vs factor()
Pick the constructor by what you need to preserve. A labelled vector keeps the original codes; a factor discards them and keeps only the categories.
| Feature | labelled() |
labelled_spss() |
factor() |
|---|---|---|---|
| Stores raw codes | Yes | Yes | No, stores levels |
| Keeps value labels | Yes | Yes | Labels become levels |
| User missing values | No | Yes (na_values, na_range) |
No |
| Class produced | haven_labelled |
haven_labelled_spss |
factor |
Ready for table(), ggplot() |
No, seen as numbers | No | Yes |
| Best for | Mirroring SPSS or Stata codes | SPSS data with missing codes | Analysis-ready categories |
The rule of thumb: use labelled() or labelled_spss() while data is being imported, cleaned, or exported, and convert to a factor with as_factor() once you start tabulating or plotting.
Common pitfalls
Watch the direction of the labels vector. Three mistakes account for most labelled() errors.
- Reversing names and values.
c(Male = 1)is correct: the name is the label, the value is the code. Writingc(1 = "Male")is a syntax error because names cannot be bare numbers. - Mismatched types. A numeric
xneeds numeric label values. Passinglabels = c(Male = "1")against a numeric vector raises an error about incompatible types. - Expecting labels everywhere.
labelled()does not require a label for every value. Unlabelled codes stay as bare numbers, which is valid but easy to overlook.
table(gender) counts 1 and 2, and ggplot() plots the codes. Convert with as_factor() before any summary or chart, or your output shows codes instead of categories.Try it yourself
Try it: Build a labelled vector named ex_status from the codes c(1, 0, 1, 1, 0) with labels Active = 1 and Inactive = 0, then confirm its class.
Click to reveal solution
Explanation: labelled() returns an object of class haven_labelled, so is.labelled() reports TRUE. The named vector c(Active = 1, Inactive = 0) maps each code to its label.
Related haven functions
These functions handle the rest of the labelled-data workflow:
as_factor()converts a labelled vector into an ordinary factor for analysis and plotting.zap_labels()strips the value labels and returns the raw codes.zap_formats()removes the SPSS or Stata display format attribute.read_sav()imports an SPSS file, producing labelled columns automatically.print_labels()prints the value-label map of any labelled vector.
val_labels() and var_label() helpers for editing labels in place. The underlying haven_labelled class is the same one labelled() creates here. See the haven reference for the full argument list.FAQ
What is the difference between labelled() and factor() in R?
labelled() keeps the original numeric or character codes and stores the labels as a separate attribute, so the data can round-trip back to SPSS or Stata unchanged. factor() discards the codes and keeps only the category levels. Use labelled() while importing or exporting data, and convert to a factor with as_factor() once you need to tabulate or plot, because most R functions treat a labelled vector as raw numbers.
How do I create a labelled vector from SPSS value labels?
Pass your data vector as x and a named vector as labels, where the names are the labels and the values are the SPSS codes, for example labelled(x, c(Yes = 1, No = 2)). If the SPSS file uses user-defined missing values such as 9 for "Refused", use labelled_spss() instead and supply na_values = 9 so those codes are flagged as missing.
Can labelled() be used on character vectors?
Yes. labelled() accepts both numeric and character vectors. For a character vector, the codes in x and the values in labels must both be strings, such as labelled(c("N", "S"), c(North = "N", South = "S")). The names of the labels vector remain the human-readable labels regardless of the underlying type.
How do I remove labels from a haven_labelled vector?
Use zap_labels() to drop the value labels and return the bare codes, or as_factor() to turn the labels into factor levels. zap_labels() is the right choice when you want plain numbers for arithmetic, while as_factor() is better when you want readable categories for summaries and charts.
Why does my labelled column show numbers instead of labels?
A labelled vector stores codes for storage efficiency and prints them as numbers by design. The labels live in an attribute and only become the visible values once you call as_factor(). Functions like table() and ggplot() see the codes, so convert the column first if you want categories in your output.