R for SPSS Users: Translate Every SPSS Procedure to R in One Guide
Every SPSS procedure has an R equivalent that produces the same numbers, usually in one or two function calls. This guide translates the most common SPSS commands into runnable R code, so you can keep doing the analyses you already know, only faster, free, and reproducible.
How does an SPSS t-test translate to R?
You probably ran your first SPSS analysis as a t-test. Let's start there. SPSS uses the T-TEST procedure with GROUPS= or PAIRS= subcommands; R uses one function, t.test(), with a formula or two vectors. The output looks different, but the numbers, t, degrees of freedom, p-value, mean difference, are identical. Run the block below and compare it to any SPSS output you have handy.
Every value SPSS prints in its t-test output viewer appears here: t = -4.06, df = 9, p-value = 0.0028, the 95% CI for the mean difference, and the mean difference itself (-1.58). If you ran the same paired test in SPSS on the classic Cushny–Peebles sleep data, the numbers would line up to four decimals. The numbers are the same because the underlying formula is the same, only the display changes.
Try it: Run a one-sample t-test asking whether the mean of sleep$extra differs from 0 (the SPSS equivalent of T-TEST /TESTVAL=0 /VARIABLES=extra.).
Click to reveal solution
Explanation: Pass a single vector and mu = 0 to test against zero. R returns the same t, df, p, and CI you'd get from T-TEST /TESTVAL=0 in SPSS.
How do I read SPSS .sav files into R?
Your existing data lives in .sav files. The haven package reads them directly and, critically, preserves the metadata SPSS users care about: variable labels, value labels, and user-defined missings. The older foreign::read.spss strips most of this; use haven instead.
read_sav() returns a tibble, the tidyverse data frame, with columns typed as <dbl>, <chr>, or haven_labelled for SPSS value-labelled variables. The first row of <dbl> markers tells you R has parsed the numeric columns correctly. For a labelled variable, you'd see <dbl+lbl> and could convert it to an R factor with as_factor(spss_data$gender).
attr(x, "label"), value labels travel as a haven_labelled class, and user-defined missings come through as tagged_na values. You can keep working in SPSS-style or convert with as_factor() when you want native R factors.foreign package strips value labels by default and warns about unrecognized features.Try it: Write the single line of R code that would read a file called survey.sav (sitting in your working directory) into an object called ex_survey.
Click to reveal solution
Explanation: read_sav() from haven takes the file path as its only required argument. The result is a tibble you can pipe into any tidyverse function or pass to base R modeling functions like lm().
How do I translate descriptive and frequency procedures?
DESCRIPTIVES, FREQUENCIES, and CROSSTABS are the three SPSS procedures you run before any model. Their R equivalents are summary(), table() plus prop.table(), and table() plus chisq.test(). The psych::describe() function adds skew, kurtosis, and standard error, closer to what SPSS prints by default in DESCRIPTIVES.
describe() prints the same columns you'd find in the SPSS Descriptives dialog, count, mean, SD, median, min, max, range, skew, kurtosis, SE. prop.table() * 100 matches the "Valid Percent" column from FREQUENCIES. The cross-tab and chi-square reproduce the cell counts plus the test statistic, df, and p-value SPSS prints under "Chi-Square Tests".
Try it: Build a frequency table for mtcars$gear and store it as ex_gear.
Click to reveal solution
Explanation: table() on a single vector returns the frequency count for each unique value, the SPSS FREQUENCIES VARIABLES = gear. equivalent in one line.
How do I run an ANOVA like SPSS UNIANOVA?
Base R's aov() is the workhorse for analysis of variance. There's one major catch: SPSS's UNIANOVA defaults to Type III sums of squares; R's aov() and summary.aov() give you Type I (sequential) sums of squares. For balanced designs the two answers are identical. For unbalanced designs they can differ enough to flip a conclusion. To match SPSS exactly, use car::Anova() with type = "III" and switch to sum-to-zero contrasts.
The one-way result is unambiguous: F(2, 29) = 39.7, p < .001, the same numbers SPSS would print for ONEWAY mpg BY cyl. The two-way Type III table now lines up cell-for-cell with the SPSS UNIANOVA "Tests of Between-Subjects Effects" table: each predictor's Sum Sq, Df, F value, and Pr(>F) are identical to what SPSS reports (try it if you have SPSS handy, the agreement is exact).
car::Anova(model, type = "III") and set options(contrasts = c("contr.sum", "contr.poly")). This is the #1 source of "the numbers don't agree!" complaints from new R users coming from SPSS.Try it: Run Tukey's HSD post-hoc on fit_one (the SPSS /POSTHOC = TUKEY equivalent) and store the result as ex_tukey.
Click to reveal solution
Explanation: TukeyHSD() takes an aov object and returns all pairwise mean differences with family-wise 95% confidence intervals and adjusted p-values, the same "Multiple Comparisons" table SPSS prints under POSTHOC = TUKEY.
How do I fit linear regression like SPSS REGRESSION?
SPSS REGRESSION /DEPENDENT y /ENTER x1 x2. becomes lm(y ~ x1 + x2, data = df). The summary() output gives you everything from the SPSS "Model Summary" and "Coefficients" tables in one printout: coefficients, standard errors, t-values, p-values, R², adjusted R², residual standard error, and the F statistic. Use confint() for the 95% confidence intervals SPSS prints next to each coefficient.
Each row of the Coefficients: block maps to a row in the SPSS "Coefficients" table: Estimate is B, Std. Error is Std. Error, t value is t, and Pr(>|t|) is the two-tailed Sig. column. The Residual standard error, Multiple R-squared, Adjusted R-squared, and F-statistic lines reproduce the SPSS "Model Summary" and "ANOVA" tables. confint() adds the 95% lower and upper bounds, the same numbers SPSS prints when you tick the "Confidence intervals" checkbox.
y ~ x1 + x2 reads "modeled as." Once the tilde clicks, every model function in R uses the same grammar, lm, glm, aov, lme4::lmer, survival::coxph, nlme::lme. Learn one syntax, fit any model.Try it: Fit a simple regression of mpg on wt only (one predictor), store as ex_simple, and print its summary().
Click to reveal solution
Explanation: Drop hp from the formula and lm() fits a one-predictor model. The intercept and wt slope shift slightly because wt now has to absorb variance previously explained by hp.
How do I run factor analysis like SPSS FACTOR?
SPSS FACTOR /VARIABLES = ... /EXTRACTION = ML /ROTATION = VARIMAX translates to psych::fa(). The psych package was built specifically to give SPSS-style psychometric output in R, so the rotated loading matrix you get back will feel familiar. The built-in bfi dataset (25 personality items from the International Personality Item Pool) is perfect for a realistic five-factor demonstration.
The sorted loading matrix groups items by their dominant factor, N1–N5 cluster on ML2 (Neuroticism), E2–E4 on ML1 (Extraversion), and so on. The SS loadings, Proportion Var, and Cumulative Var rows are identical in form to the "Total Variance Explained" table SPSS prints. h2 is the communality (sum of squared loadings, what SPSS calls "Extraction Communality"), and u2 is uniqueness (1 − h2).
Try it: Re-run the factor analysis with oblimin rotation (SPSS's default oblique rotation) and store the result as ex_fa.
Click to reveal solution
Explanation: Change rotate = "varimax" (orthogonal) to rotate = "oblimin" (oblique) and the factors are allowed to correlate. The output now includes a factor correlations block, these are the same correlations SPSS shows under "Component Correlation Matrix" when you choose Direct Oblimin.
How do I compute reliability (Cronbach's alpha) like SPSS RELIABILITY?
SPSS RELIABILITY /VARIABLES = ... /STATISTICS = ALPHA becomes psych::alpha(). The output gives you raw alpha, standardized alpha, average inter-item correlation, item-total statistics, and the "alpha if item dropped" column, the same five blocks SPSS users are used to seeing. Use check.keys = TRUE so psych auto-flips reverse-coded items (in SPSS you'd do this by hand with RECODE).
The first block prints raw alpha (0.70), standardized alpha (0.71), and Guttman's lambda 6, exactly the columns SPSS prints in its "Reliability Statistics" output, plus a 95% CI. The "Reliability if an item is dropped" block reproduces the SPSS "Item-Total Statistics" → "Cronbach's Alpha if Item Deleted" column. The A1- suffix shows that psych reversed item A1 because it correlated negatively with the total, saving you a manual RECODE step.
- on the item name in the output), so you can audit the keying rather than discovering a hidden recode three months later.Try it: Compute Cronbach's alpha on the Conscientiousness subscale (bfi[, 6:10]) and store the result as ex_alpha.
Click to reveal solution
Explanation: Same call, different column slice. Conscientiousness lands around raw_alpha = 0.73, slightly above Agreeableness, consistent with published bfi scale reliabilities.
Practice Exercises
These capstone exercises combine procedures from multiple sections. Use distinct variable names (my_*) so they don't overwrite tutorial state.
Exercise 1: Multi-predictor regression with confidence intervals
Fit a regression of mpg on wt, hp, and cyl using mtcars. Store the model as my_fit. Print the summary() and confint(). Identify which predictor has the largest absolute t-value.
Click to reveal solution
Explanation: wt has the largest absolute t-value (-4.28) and is the only predictor whose 95% confidence interval excludes zero. Adding cyl (collinear with hp) absorbs some of the variance previously attributed to hp, dropping hp's t-value below the conventional significance threshold.
Exercise 2: Reliability and one-factor model on the same items
For the Extraversion items (bfi[, 11:15]), compute Cronbach's alpha and a one-factor fa() solution. Save the alpha as my_alpha and the factor analysis as my_fa. Inspect my_fa$Vaccounted to see what proportion of the items' variance the single factor explains.
Click to reveal solution
Explanation: Raw alpha is 0.76 and the single factor explains about 51% of the items' variance, a solid indicator that the five Extraversion items measure one underlying trait. Whenever alpha is high and a one-factor solution captures most of the variance, your scale is behaving as a unidimensional measure.
Complete Example
Here's an end-to-end mini-workflow on the bfi Agreeableness scale: descriptives → reliability → 1-factor solution → regression of total score on age. What would be five separate dialog boxes in SPSS is one short R script you can re-run on new data tomorrow.
The whole pipeline runs in one block. Alpha is 0.70 (acceptable), one factor explains about 38% of the variance (modest, a single-factor model fits but isn't perfect), and the regression shows agreeableness scores increase by 0.06 points per year of age, small in magnitude but highly significant given the sample size of 2,236. Re-running this on new data is a single command. In SPSS, you'd click through five dialogs and pray you remembered every option.
Summary
| SPSS Procedure | R Function | Package |
|---|---|---|
GET FILE / SAVE OUTFILE |
read_sav() / write_sav() |
haven |
DESCRIPTIVES |
summary(), describe() |
base, psych |
FREQUENCIES |
table(), prop.table() |
base |
CROSSTABS |
table(), chisq.test() |
base |
T-TEST |
t.test() |
base |
ONEWAY |
aov(), TukeyHSD() |
base |
UNIANOVA (Type III) |
aov() + car::Anova(type = "III") |
base, car |
REGRESSION |
lm(), summary(), confint() |
base |
LOGISTIC REGRESSION |
glm(family = binomial) |
base |
CORRELATIONS |
cor(), cor.test() |
base |
FACTOR |
psych::fa(), prcomp() |
psych, base |
RELIABILITY (Cronbach α) |
psych::alpha() |
psych |
NPAR TESTS |
wilcox.test(), kruskal.test() |
base |
MIXED |
lme4::lmer() |
lme4 |
Three habits will make the transition smooth: read .sav files with haven::read_sav(), default to psych::describe() for a SPSS-style descriptive table, and remember car::Anova(type = "III") whenever you fit an unbalanced ANOVA. Everything else is a one-line lookup against the table above.
References
- Wickham, H. & Miller, E., haven: Import and Export 'SPSS', 'Stata' and 'SAS' Files. Tidyverse documentation. Link
- Revelle, W., psych: Procedures for Psychological, Psychometric, and Personality Research. CRAN package. Link
- Fox, J. & Weisberg, S., An R Companion to Applied Regression, 3rd Edition. SAGE Publications (2019). Link
- R Core Team, An Introduction to R, base statistics functions. Link
- UCLA OARC Statistical Consulting, R Resources and Tutorials. Link
- Field, A., Miles, J. & Field, Z., Discovering Statistics Using R. SAGE Publications. Link
- Tidyverse blog, haven 2.5.0 release notes. Link
Continue Learning
- Is R Worth Learning in 2026?, the case for adding R to your toolkit
- R for SAS Users, sister migration guide written in the same format
- R for Excel Users, the Excel-to-R version of this article