R for SPSS Users: Convert Your Analyses to R — Complete Guide

If you've been running analyses in SPSS, you already understand statistics. This guide translates every common SPSS procedure into R code — same analysis, different syntax, better reproducibility.

SPSS is menu-driven and syntax-based. R is code-only. But the underlying statistics are identical: a t-test is a t-test whether you write T-TEST in SPSS or t.test() in R. This guide maps the SPSS procedures you already know to their R equivalents, so you can transition without re-learning statistics.

Why Switch from SPSS to R?

Factor SPSS R
Cost $99/month (Standard) Free and open source
Reproducibility Menu clicks aren't saved Every step is code
Packages Fixed set of procedures 20,000+ packages on CRAN
Automation Limited macro system Full programming language
Visualizations Built-in chart builder ggplot2, plotly, and more
Community Declining user base Growing, active community
Job market Shrinking demand Growing demand

You don't have to switch overnight. Many researchers use both during the transition — running familiar analyses in SPSS while learning the R equivalent.

Data Management: SPSS → R

SPSS Syntax R Equivalent Description
GET FILE='data.sav'. haven::read_sav("data.sav") Read SPSS data file
SAVE OUTFILE='data.sav'. haven::write_sav(df, "data.sav") Save as SPSS file
DESCRIPTIVES VARIABLES=x y. summary(df[, c("x","y")]) Descriptive statistics
FREQUENCIES VARIABLES=x. table(df$x) Frequency table
COMPUTE newvar = x + y. df$newvar <- df$x + df$y Create new variable
RECODE x (1=1)(2=2)(ELSE=3). df$x <- ifelse(df$x %in% 1:2, df$x, 3) Recode values
SELECT IF (age >= 18). df <- subset(df, age >= 18) Filter cases
SORT CASES BY x (A). df <- df[order(df$x), ] Sort ascending
MISSING VALUES x (99). df$x[df$x == 99] <- NA Define missing values
SPLIT FILE BY group. split(df, df$group) Split by group
WEIGHT BY freq. Use weights argument in functions Weight cases
# Reading SPSS files and basic data management # In real code: df <- haven::read_sav("mydata.sav") # Demo with built-in data df <- mtcars df$car <- rownames(df) # DESCRIPTIVES equivalent cat("=== DESCRIPTIVES ===\n") summary(df[, c("mpg", "hp", "wt")])


  

Descriptive Statistics

SPSS FREQUENCIES → R

# SPSS: FREQUENCIES VARIABLES=cyl /STATISTICS=MEAN STDDEV MIN MAX. # R equivalent: cat("=== FREQUENCIES: cyl ===\n") freq_table <- table(mtcars$cyl) print(freq_table) cat("\nPercentages:\n") print(round(prop.table(freq_table) * 100, 1)) cat("\n=== Descriptive Statistics ===\n") cat("Mean MPG:", round(mean(mtcars$mpg), 2), "\n") cat("SD MPG:", round(sd(mtcars$mpg), 2), "\n") cat("Min MPG:", min(mtcars$mpg), "\n") cat("Max MPG:", max(mtcars$mpg), "\n")


  

SPSS CROSSTABS → R

# SPSS: CROSSTABS /TABLES=cyl BY am /STATISTICS=CHISQ. # R equivalent: cat("=== Cross-tabulation ===\n") ct <- table(mtcars$cyl, mtcars$am) colnames(ct) <- c("Automatic", "Manual") print(ct) cat("\n=== Chi-Square Test ===\n") chi_result <- chisq.test(ct) cat("Chi-squared:", round(chi_result$statistic, 3), "\n") cat("p-value:", round(chi_result$p.value, 4), "\n")


  

T-Tests

SPSS Syntax R Equivalent
T-TEST /TESTVAL=0 /VARIABLES=x. t.test(x, mu = 0)
T-TEST GROUPS=group(1,2) /VARIABLES=x. t.test(x ~ group, data = df)
T-TEST PAIRS=before WITH after. t.test(before, after, paired = TRUE)
# SPSS: T-TEST GROUPS=am(0,1) /VARIABLES=mpg. # R equivalent: Independent samples t-test cat("=== Independent Samples T-Test ===\n") cat("(MPG by Transmission Type)\n\n") result <- t.test(mpg ~ am, data = mtcars) print(result)


  
# SPSS: T-TEST PAIRS=before WITH after (PAIRED). # R equivalent: Paired samples t-test set.seed(42) before <- c(120, 135, 128, 140, 132, 125, 138, 130, 145, 127) after <- c(115, 128, 122, 130, 125, 120, 132, 126, 138, 120) cat("=== Paired Samples T-Test ===\n") result <- t.test(before, after, paired = TRUE) print(result)


  

ANOVA

SPSS Syntax R Equivalent
ONEWAY x BY group /POSTHOC=TUKEY. aov() + TukeyHSD()
GLM x BY a b /DESIGN=a b a*b. aov(x ~ a * b, data = df)
GLM x BY group WITH covariate. aov(x ~ covariate + group, data = df)
# SPSS: ONEWAY mpg BY cyl /POSTHOC=TUKEY. # R equivalent: One-way ANOVA with Tukey post-hoc cat("=== One-Way ANOVA ===\n") mtcars$cyl_f <- factor(mtcars$cyl) anova_result <- aov(mpg ~ cyl_f, data = mtcars) print(summary(anova_result)) cat("\n=== Tukey Post-Hoc ===\n") print(TukeyHSD(anova_result))


  
# SPSS: GLM mpg BY cyl am /DESIGN=cyl am cyl*am. # R equivalent: Two-way ANOVA with interaction cat("=== Two-Way ANOVA ===\n") model <- aov(mpg ~ factor(cyl) * factor(am), data = mtcars) print(summary(model))


  

Regression

SPSS Syntax R Equivalent
REGRESSION /DEPENDENT y /ENTER x1 x2. lm(y ~ x1 + x2, data = df)
REGRESSION /DEPENDENT y /STEPWISE x1 x2 x3. step(lm(y ~ ., data = df))
LOGISTIC REGRESSION y WITH x1 x2. glm(y ~ x1 + x2, family = binomial)
# SPSS: REGRESSION /DEPENDENT mpg /ENTER wt hp. # R equivalent: Multiple linear regression cat("=== Linear Regression ===\n") model <- lm(mpg ~ wt + hp, data = mtcars) print(summary(model))


  
# SPSS: LOGISTIC REGRESSION am WITH mpg wt. # R equivalent: Logistic regression cat("=== Logistic Regression ===\n") logit_model <- glm(am ~ mpg + wt, data = mtcars, family = binomial) print(summary(logit_model))


  

Correlation

# SPSS: CORRELATIONS /VARIABLES=mpg hp wt qsec. # R equivalent: cat("=== Correlation Matrix ===\n") vars <- mtcars[, c("mpg", "hp", "wt", "qsec")] cor_matrix <- round(cor(vars), 3) print(cor_matrix) # With p-values (like SPSS shows) cat("\n=== Correlation Test (mpg vs wt) ===\n") print(cor.test(mtcars$mpg, mtcars$wt))


  

Factor Analysis

# SPSS: FACTOR /VARIABLES=v1 v2 v3 v4 v5 /EXTRACTION=PC /ROTATION=VARIMAX. # R equivalent: cat("=== Principal Components ===\n") pca_data <- mtcars[, c("mpg","disp","hp","drat","wt","qsec")] pca_result <- prcomp(pca_data, scale. = TRUE) cat("Standard deviations:\n") print(round(pca_result$sdev, 3)) cat("\nProportion of variance:\n") var_prop <- pca_result$sdev^2 / sum(pca_result$sdev^2) print(round(var_prop, 3)) cat("\nLoadings (first 2 PCs):\n") print(round(pca_result$rotation[, 1:2], 3))


  

Complete SPSS → R Reference Table

SPSS Procedure R Function Package
DESCRIPTIVES summary(), psych::describe() base, psych
FREQUENCIES table(), prop.table() base
CROSSTABS table(), chisq.test() base
T-TEST t.test() base
ONEWAY aov(), TukeyHSD() base
GLM (ANOVA) aov(), car::Anova() base, car
REGRESSION lm(), summary() base
LOGISTIC glm(family=binomial) base
CORRELATIONS cor(), cor.test() base
RELIABILITY psych::alpha() psych
FACTOR factanal(), prcomp() base
NONPAR CORR cor(method="spearman") base
NPAR TESTS wilcox.test(), kruskal.test() base
EXAMINE shapiro.test(), car::leveneTest() base, car
MIXED lme4::lmer() lme4
SURVIVAL survival::survfit() survival

FAQ

Can R read my existing SPSS .sav files? Yes. The haven package reads .sav files perfectly: haven::read_sav("mydata.sav"). It preserves variable labels, value labels, and missing value definitions. Install with install.packages("haven").

Where does SPSS output go in R? SPSS has a separate Output Viewer window. In R, output prints to the console. For formatted reports, use R Markdown or Quarto to create Word/PDF/HTML documents that combine code, output, and narrative — much better than SPSS output.

Is R's statistical output different from SPSS? The numbers are the same (they use the same algorithms), but the formatting differs. R's summary(lm(...)) shows coefficients, standard errors, t-values, and p-values — the same information as SPSS regression output, just in a different layout.

What's Next