haven as_factor() in R: Convert Labelled Data to Factors
The haven as_factor() function converts a labelled vector, the kind read_sav(), read_dta(), and read_sas() produce, into an ordinary R factor. It turns the stored value labels into factor levels.
as_factor(x) # labels where available, else values as_factor(x, levels = "labels") # use value labels only as_factor(x, levels = "values") # keep the raw codes only as_factor(x, levels = "both") # combine as "[code] label" as_factor(x, ordered = TRUE) # return an ordered factor as_factor(df) # convert every labelled column as_factor(df, only_labelled = FALSE) # also convert plain columns
Need explanation? Read on for examples and pitfalls.
What as_factor() does
as_factor() turns a labelled vector into a factor. When haven reads an SPSS, Stata, or SAS file, coded columns such as 1 = "Male" arrive with class haven_labelled. They print as numbers but carry the labels as a hidden attribute. as_factor() reads that attribute and builds a proper factor whose levels are the human-readable labels.
This matters because most R functions, from table() to ggplot(), treat a haven_labelled column as raw numbers. A labelled column of survey responses would tabulate as 1, 2, 3 instead of Male, Female, Other. Converting with as_factor() makes the labels the visible values, so summaries and plots read correctly.
Syntax and key arguments
as_factor() is a generic with methods for a single vector and for a whole data frame. Most calls need only x; the levels and ordered arguments control how the factor is built.
The levels argument decides what the factor levels become. With "default" or "labels" the labels are used; "values" keeps the raw codes; "both" combines them as [1] Male. Set ordered = TRUE when the categories have a natural rank, such as Low, Medium, High, so comparisons with < work.
When x is a data frame, as_factor() converts every labelled column at once and leaves the rest alone. The only_labelled argument controls that: it defaults to TRUE, so plain numeric and character columns pass through untouched.
Categorical with named .cat.categories. The haven equivalent of attaching those category names is as_factor(), which bakes the value labels into a base R factor.as_factor() examples
Build a labelled vector with labelled(), then convert it. haven's labelled() constructor reproduces the kind of column a .sav file holds, so every example below runs without a data file of your own.
The result is a factor. Each 1 became Male and each 2 became Female, because as_factor() read the labels attribute.
Use levels = "values" or "both" to keep the codes visible. Survey codebooks often need the raw number alongside the label.
The "values" mode discards the labels entirely, while "both" keeps an audit trail by showing the code in brackets before each label.
Pass ordered = TRUE for ranked categories. An ordered factor lets you compare and sort levels.
The < signs in the printed levels confirm the order, so satisfaction > "Low" now returns a logical result instead of an error.
Call as_factor() on a whole data frame to convert every labelled column at once. This is the common move right after reading a survey file.
The respondent column stays an integer because it was never labelled, while sex and grade became factors in a single call.
read_sav("survey.sav") |> as_factor(), so every downstream summary and plot sees readable labels instead of raw codes.as_factor() vs factor() and zap_labels()
as_factor() is the only one of these that reads value labels. Base factor() and haven's zap_labels() solve nearby problems, so the choice depends on whether you want the labels kept, ignored, or discarded.
| Function | Input | Output | Uses value labels? |
|---|---|---|---|
as_factor() (haven) |
labelled vector | factor | Yes, labels become levels |
factor() (base R) |
any vector | factor | No, uses the raw values |
zap_labels() (haven) |
labelled vector | plain vector | No, strips the labels off |
as_factor() (forcats) |
character or numeric | factor | No, orders by first appearance |
Reach for factor() when a column is already plain text or numbers with no label attribute. Use zap_labels() when you want the raw codes as a clean numeric vector, for example before a model that expects numbers. Use as_factor() whenever the labels are the meaning you want to keep.
Common pitfalls
The data frame method skips plain columns by default. If a column you expected to convert is still numeric or character, it was never labelled. Pass only_labelled = FALSE to convert plain columns too.
Unlabelled codes fall back to their raw value. If a labelled vector holds a code with no matching label, levels = "default" uses the bare number as that level. Confirm your codebook is complete before converting, or those rows will read as digits among the words.
as_factor() resolves to the forcats version, which cannot read value labels. Call haven::as_factor() explicitly whenever both packages are attached.Try it yourself
Try it: Convert the labelled vector ex_party into a factor that shows the codes and labels together, using levels = "both". Save the result to ex_result.
Click to reveal solution
Explanation: Passing levels = "both" tells as_factor() to build each level from the raw code and its label, so the factor keeps the audit trail from the original codebook.
Related haven functions
as_factor() works alongside a small set of haven label helpers. Reach for the one that matches what you want to do with the labels and formats.
zap_labels(): strip value labels and keep the raw underlying values.zap_formats(): drop SPSS and Stata display formats from a column.zap_missing(): convert tagged user-defined missing values to plainNA.labelled(): build a labelled vector by hand, useful for tests and examples.read_sav(): read an SPSS.savfile, which often returns labelled columns.read_dta(): read a Stata.dtafile with its variable and value labels.
For the full argument reference, see the haven as_factor() documentation on tidyverse.org.
FAQ
What does haven as_factor() do?
as_factor() converts a labelled vector into a base R factor. When haven reads an SPSS, Stata, or SAS file, coded columns arrive with class haven_labelled, storing labels such as 1 = "Male" as an attribute. as_factor() reads that attribute and rebuilds the column as a factor whose levels are the labels. It also works on a whole data frame, converting every labelled column in one call while leaving plain columns alone.
What is the difference between as_factor() and factor() in R?
Base factor() builds a factor from the values it sees, with no knowledge of value labels. Calling it on a haven_labelled column produces a factor with levels 1, 2, 3. haven's as_factor() reads the label attribute first, so the same column becomes a factor with levels Male, Female, Other. Use factor() for plain vectors and as_factor() whenever the column carries labels from a statistical file.
How do I convert SPSS labelled variables to factors in R?
Read the file with read_sav(), then call as_factor() on the resulting tibble. The data frame method converts every labelled column at once and leaves untouched columns alone. A one-line pipeline such as read_sav("survey.sav") |> as_factor() gives you a tibble where coded survey responses display as readable categories instead of numbers.
What does levels = "both" do in as_factor()?
levels = "both" builds each factor level from the raw code and its label together, formatted as [1] Male. It is useful for survey work where you want the original numeric code visible alongside the label as an audit trail. The other options are "labels" for label text only, "values" for the raw codes only, and the "default" setting that uses labels where they exist.
Why does as_factor() behave differently after I load the tidyverse?
Both haven and forcats export a function named as_factor(). Whichever package is attached last masks the other. The forcats version converts character or numeric vectors and orders levels by first appearance; it cannot read haven value labels. If you load the tidyverse after haven, call haven::as_factor() with the explicit namespace prefix so you get the labelled-aware version.