EDA for Time Series in R: Seasonality, Trend & Autocorrelation Plots
Time series EDA reveals the hidden structure in temporal data — trend direction, seasonal cycles, and autocorrelation patterns — before you fit any model. This guide walks you through every essential diagnostic plot in R, from raw time plots to STL decomposition and ACF/PACF correlograms, using built-in datasets you can run right now.
What does a raw time series plot reveal?
Every time series tells a story through its shape. Before you forecast or model, you need to read that story. Let's load R's built-in AirPassengers dataset and plot it to see what patterns jump out immediately.
Even from this single plot, three things are immediately visible: a rising trend (passengers increase year over year), seasonal peaks every summer (July-August), and the peaks are growing larger over time. That growing amplitude is a clue — it suggests multiplicative rather than additive seasonality, something we'll confirm later with decomposition.
The AirPassengers object is a ts (time series) object — R's built-in class for regularly spaced temporal data. You can create one from any numeric vector by specifying the start date and frequency.
The frequency argument tells R how many observations make up one cycle: 12 for monthly, 4 for quarterly, 52 for weekly, 365 for daily. This metadata powers every seasonal function we'll use from here on.
Try it: Create a ts object called ex_quarterly from the vector c(50, 80, 120, 90, 55, 85, 130, 95) representing 2 years of quarterly data starting in Q1 2024, then print it.
Click to reveal solution
Explanation: Setting frequency = 4 tells R these are quarterly observations, so print() labels columns as Qtr1-Qtr4 automatically.
How do you detect and visualize trend?
Trend is the long-term direction of a series — is it going up, down, or staying flat? Raw data is often too noisy to see the trend clearly, so we smooth it with a moving average that averages out the seasonal bumps.
The red line cuts through the seasonal noise and reveals a clear, accelerating upward trend. Notice the moving average window is 12 — matching the seasonal period (12 months). This is deliberate: averaging over exactly one full cycle cancels out the seasonal component, leaving only the trend.
R's decompose() function also extracts the trend component automatically. Let's compare.
The trend from decompose() looks nearly identical to our manual moving average — because that's exactly what it computes internally. The NA values at the start and end are an unavoidable edge effect of centered moving averages: you can't compute a 12-month average for the first and last 6 months.
Try it: Load the built-in UKgas dataset (quarterly UK gas consumption). Apply a 4-quarter moving average using stats::filter() and plot it overlaid on the raw data.
Click to reveal solution
Explanation: rep(1/4, 4) creates equal weights for a 4-quarter average, and sides = 2 makes it centered (looking both forward and backward).
How do you identify seasonal patterns?
Seasonality means repeating cycles of fixed length — summer peaks in airline travel, winter spikes in heating bills, Monday dips in retail sales. Three R functions make seasonal patterns impossible to miss: monthplot(), boxplot() by cycle, and grouped subseries analysis.
The monthplot() function is one of R's most underused EDA tools. Each vertical strip shows how one month behaves across all years, with a horizontal line at the mean. You can instantly see that July and August have the highest averages, while November and February are the lowest. The upward slope within each strip confirms the trend we found earlier.
Boxplots by month give a complementary view — they show the distribution, not just the trajectory.
The boxplots confirm the seasonal pattern and add a new insight: summer months (June-August) have not only higher medians but also wider interquartile ranges. This growing spread is another sign of multiplicative seasonality — the seasonal effect scales with the level of the series.
Let's quantify the monthly pattern by computing average passengers per month across all years.
July averages 314 thousand passengers versus November's 165 thousand — a seasonal swing of nearly 2x. For airline planning, this means capacity in peak summer needs to be almost double the November baseline.
Try it: Load the built-in nottem dataset (average air temperatures in Nottingham, 1920-1939). Use monthplot() and identify which month is warmest and which is coldest.
Click to reveal solution
Explanation: monthplot() reveals a textbook sinusoidal seasonal pattern for temperature. tapply() with cycle() groups by month to find the extremes.
How does time series decomposition separate trend, seasonality, and noise?
Decomposition is the power tool of time series EDA. It splits your series into three components: Trend (the long-term direction), Seasonal (the repeating cycle), and Remainder (everything else — noise, anomalies, unexplained variation). The question is whether these combine by addition or multiplication.
- Additive: Observed = Trend + Seasonal + Remainder (use when seasonal swings stay constant)
- Multiplicative: Observed = Trend x Seasonal x Remainder (use when seasonal swings grow with the level)
Let's see both side by side on AirPassengers.
Compare the "Random" (remainder) panels. In the additive decomposition, the remainder shows a fan-shaped pattern — the residuals grow larger over time. In the multiplicative decomposition, the remainder is more uniform. That fan shape in the additive version means the model isn't capturing the growing seasonal amplitude properly. The multiplicative version handles it correctly.
stl() (Seasonal and Trend decomposition using Loess) is a more sophisticated alternative that adapts the seasonal component over time and handles outliers gracefully.
The grey bars on STL plots are a clever feature: they show the relative scale of each component. The trend bar is small (large variation relative to the data), while the remainder bar is large (small variation) — meaning trend dominates the series and noise is relatively minor. Exactly what you'd expect for a dataset with a strong upward trajectory.
Try it: Decompose the co2 dataset (Mauna Loa CO2 concentrations) using stl() with s.window = "periodic". Look at the seasonal component — is the amplitude constant or changing?
Click to reveal solution
Explanation: The CO2 seasonal component has nearly constant amplitude (~6 ppm swing) regardless of the rising level, confirming additive seasonality. This contrasts with AirPassengers where the amplitude grows.
What do ACF and PACF plots tell you about your data?
The autocorrelation function (ACF) measures how strongly a series correlates with lagged copies of itself. If today's value is similar to yesterday's, lag-1 autocorrelation is high. The partial autocorrelation function (PACF) measures the direct correlation at each lag after removing the effects of all shorter lags. Together, they're the fingerprint of a time series.
The ACF plot shows two patterns at once. First, the slow decay of correlation from lag 1 outward — that's the trend. Values close in time are similar because the series drifts upward slowly. Second, the scalloped shape with peaks at lags 12, 24, 36 — that's the seasonality. January correlates strongly with next January, moderately with the January after that, and so on.
The blue dashed lines are 95% confidence bands. Any spike outside these bands is statistically significant — it's unlikely to appear in purely random data. Nearly every lag in AirPassengers exceeds the bands, confirming strong temporal structure.
The PACF tells a cleaner story. After removing indirect correlations, only a few lags show direct effects: lag 1 (each month directly depends on the previous month), lag 12 (each month directly depends on the same month last year), and lag 13 (a combination effect). This PACF pattern is the signature of a seasonal autoregressive process — exactly what ARIMA modelers look for.
What happens if we remove the trend by differencing? The ACF should lose its slow decay and reveal pure seasonality.
Differencing removed the slow decay. What remains are sharp spikes at lags 12, 24, and 36 — pure seasonal autocorrelation. This confirms that after removing the trend, the dominant remaining structure is the 12-month seasonal cycle. If you differenced again at lag 12 (seasonal differencing), even these spikes would shrink, leaving near-white-noise residuals.
Try it: Compute and plot the ACF of diff(ap, lag = 12) (seasonally differenced AirPassengers). What happens to the seasonal spikes at lags 12, 24, 36?
Click to reveal solution
Explanation: Seasonal differencing (subtracting the value from 12 months ago) removes the seasonal pattern, just as first differencing removes the trend. The resulting ACF shows much less structure — closer to white noise.
How do you test for stationarity formally?
Visual inspection of ACF plots gives strong hints about stationarity (a slowly decaying ACF means non-stationary), but formal statistical tests give you a definitive answer. The two standard tests are the Augmented Dickey-Fuller (ADF) test and the KPSS test. They test opposite null hypotheses, so using both gives a complete picture.
- ADF test: Null = series has a unit root (non-stationary). Low p-value → reject null → series is stationary.
- KPSS test: Null = series is stationary. Low p-value → reject null → series is non-stationary.
install.packages("tseries"). The package may not be available in all browser-based R environments.The raw AirPassengers series returns a p-value of 0.01 with the ADF test, which seems to suggest stationarity. But this can be misleading — the ADF test often rejects the null for trending series with strong deterministic patterns. The log-differenced series gives a more convincingly low p-value with a larger (more negative) test statistic.
In practice, always combine formal tests with visual evidence. If the ACF decays slowly and the time plot shows a clear trend, the series is non-stationary regardless of what one test says. The diff(log()) transformation is a common preprocessing pipeline: log() stabilizes the variance (addresses multiplicative seasonality), and diff() removes the trend.
Try it: Run adf.test() on the Nile dataset (annual Nile river flow, 1871-1970). Is the Nile flow stationary? What does the ACF suggest?
Click to reveal solution
Explanation: The ADF p-value of ~0.07 is borderline — not clearly stationary. The ACF shows moderate short-lag autocorrelation that decays within about 5 lags, with no seasonal pattern (annual data has no sub-annual seasonality). The Nile series has a structural break around 1898 (a known dam construction), which complicates stationarity assessment.
Practice Exercises
Exercise 1: Full EDA pipeline on CO2 data
Run a complete time series EDA on the co2 dataset (monthly atmospheric CO2 at Mauna Loa, 1959-1997):
- Plot the raw series
- Decompose with
stl()and examine all three components - Plot the ACF of the remainder component
- Determine whether the residuals resemble white noise
Click to reveal solution
Explanation: The remainder has small but significant autocorrelation at short lags, suggesting the additive model doesn't capture all structure. However, the remainder's standard deviation (~0.32 ppm) is tiny compared to the 30+ ppm trend range, so the decomposition captures the dominant patterns well.
Exercise 2: Additive vs multiplicative decomposition comparison
Compare additive and multiplicative decomposition on AirPassengers:
- Run
decompose()with bothtype = "additive"andtype = "multiplicative" - Plot the ACF of the remainder from each decomposition
- Which decomposition produces more random (white-noise-like) residuals?
Click to reveal solution
Explanation: The multiplicative remainder ACF is closer to white noise because it correctly accounts for the seasonal amplitude growing proportionally with the trend. The additive model leaves seasonal structure in the residuals.
Exercise 3: Sunspot cycle analysis
Perform a complete EDA on sunspot.year (yearly sunspot numbers, 1700-1988):
- Plot the raw series and overlay an 11-year moving average (the solar cycle)
- Plot the ACF and PACF
- Run
adf.test()from the tseries package - Summarize: is there trend? Cyclicality? Is the series stationary?
Click to reveal solution
Explanation: Sunspot data shows no trend but strong cyclicality (~11 years). The ACF oscillates rather than decaying, which is the signature of a cyclical (not seasonal in the strict sense) process. Unlike AirPassengers, the series is stationary — the cycles don't grow or shift over time.
Putting It All Together
Let's run a complete time series EDA on the ldeaths dataset — monthly deaths from bronchitis, emphysema, and asthma in the UK (1974-1979). This brings together every technique from this tutorial.
This complete example shows the EDA workflow in action: start with the raw plot, identify seasonality with monthplot(), decompose to separate components, then check the remainder with ACF/PACF. The lung disease dataset has classic additive seasonality — winter respiratory illness peaks that stay roughly the same size each year — with a mild downward trend over the 6-year period. After STL decomposition, the remainder is approximately white noise, meaning the decomposition captured all the systematic structure.
Summary
| Technique | R Function | What It Reveals | When to Use |
|---|---|---|---|
| Time plot | ts.plot(), plot() |
Overall shape: trend, seasonality, anomalies | Always — your first step |
| Moving average | stats::filter() |
Underlying trend (strips seasonal noise) | When trend is obscured by seasonality |
| Month/seasonal plot | monthplot() |
Per-season behavior across years | Monthly or quarterly data |
| Boxplot by cycle | boxplot(x ~ cycle(x)) |
Distribution differences across seasons | Comparing seasonal spread |
| Additive decomposition | decompose(x, "additive") |
Trend + Seasonal + Remainder | Constant seasonal amplitude |
| Multiplicative decomposition | decompose(x, "multiplicative") |
Trend x Seasonal x Remainder | Growing seasonal amplitude |
| STL decomposition | stl() |
Robust trend/seasonal/remainder split | Production use (handles outliers) |
| ACF | acf() |
Autocorrelation at all lags | Detecting trend (slow decay) and seasonality (periodic spikes) |
| PACF | pacf() |
Direct correlation at each lag | Identifying AR order for modeling |
| Differencing | diff() |
Removes trend (lag 1) or seasonality (lag = period) | Making a series stationary |
| ADF test | tseries::adf.test() |
Formal stationarity test (null = non-stationary) | Confirming visual stationarity assessment |
References
- Hyndman, R.J. & Athanasopoulos, G. — Forecasting: Principles and Practice, 3rd Edition. Chapter 2: Time Series Graphics. Link
- R Documentation — ts(): Time-Series Objects. Link
- R Documentation — decompose(): Classical Seasonal Decomposition by Moving Averages. Link
- R Documentation — stl(): Seasonal Decomposition of Time Series by Loess. Link
- R Documentation — acf(): Auto- and Cross-Covariance and -Correlation Function Estimation. Link
- Cleveland, R.B., Cleveland, W.S., McRae, J.E. & Terpenning, I. (1990) — STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6(1), 3-73.
- Said, S.E. & Dickey, D.A. (1984) — Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order. Biometrika, 71(3), 599-607.
Continue Learning
- Exploratory Data Analysis in R — The parent 7-step EDA framework this post extends for temporal data
- Correlation Analysis in R — Pearson, Spearman, and Kendall correlations for measuring variable relationships
- Correlation Matrix Plot in R — Visualize pairwise correlations with corrplot and ggcorrplot