r-statistics.co by Selva Prabhakaran


ggplot2 Quickref

Basics tasks

Legend

Plot text and annotation

Multiple plots

Geom layers

Not finding what you were looking for? Let me know!

Basic tasks

Basic plot setup

gg <- ggplot(df, aes(x=xcol, y=ycol)) 

df must be a dataframe that contains all information to make the ggplot. Plot will show up only after adding the geom layers.

Scatterplot

library(ggplot2)
gg <- ggplot(diamonds, aes(x=carat, y=price)) 
gg + geom_point()

Static - point size, shape, color and boundary thickness

gg + geom_point(size=1, shape=1, color="steelblue", stroke=2)  # 'stroke' controls the thickness of point boundary

Dynamic - point size, shape, color and boundary thickness

Make the aesthetics vary based on a variable in df.

gg + geom_point(aes(size=carat, shape=cut, color=color, stroke=carat))  # carat, cut and color are variables in `diamonds`       

Add Title, X and Y axis labels

gg1 <- gg + geom_point(aes(color=color))
gg2 <- gg1 + labs(title="Diamonds", x="Carat", y="Price")  # ggtitle("title") also changes the title.
print(gg2)

Change color of all text

gg2 + theme(text=element_text(color="blue"))  # all text turns blue.

Change title, X and Y axis label and text size

plot.title: Controls plot title. axis.title.x: Controls X axis title axis.title.y: Controls Y axis title axis.text.x: Controls X axis text axis.text.y: Controls y axis text

gg3 <- gg2 + theme(plot.title=element_text(size=25), axis.title.x=element_text(size=20), axis.title.y=element_text(size=20), axis.text.x=element_text(size=15), axis.text.y=element_text(size=15))
print(gg3)

Change title face, color, line height

gg3 + labs(title="Plot Title\nSecond Line of Plot Title") + theme(plot.title=element_text(face="bold", color="steelblue", lineheight=1.2))

Change point color

gg3 + scale_colour_manual(name='Legend', values=c('D'='grey', 'E'='red', 'F'='blue', 'G'='yellow', 'H'='black', 'I'='green', 'J'='firebrick'))

Adjust X and Y axis limits

Method 1: Zoom in

gg3 + coord_cartesian(xlim=c(0,3), ylim=c(0, 5000)) + geom_smooth()  # zoom in

Method 2: Deletes the points outside limits

gg3 + xlim(c(0,3)) + ylim(c(0, 5000)) + geom_smooth()  # deletes the points 
#> Warning messages:
#> 1: Removed 14714 rows containing non-finite values (stat_smooth). 
#> 2: Removed 14714 rows containing missing values (geom_point). 

Method 3: Deletes the points outside limits

gg3 + scale_x_continuous(limits=c(0,3)) + scale_y_continuous(limits=c(0, 5000)) + geom_smooth()  # deletes the points outside limits
#> Warning message:
#> Removed 14714 rows containing missing values (geom_point). 

Notice the change in smoothing line because of deleted points. This could sometimes be misleading in your analysis.

Change X and Y axis labels

gg3 + scale_x_continuous(labels=c("zero", "one", "two", "three", "four", "five")) + scale_y_continuous(breaks=seq(0, 20000, 4000))  # if Y is continuous, if X is a factor

Use scale_x_discrete instead, if X variable is a factor.

Rotate axis text

gg3 + theme(axis.text.x=element_text(angle=45), axis.text.y=element_text(angle=45))

Flip X and Y Axis

gg3 + coord_flip()  # flips X and Y axis.

Grid lines and panel background

gg3 + theme(panel.background = element_rect(fill = 'springgreen'),
  panel.grid.major = element_line(colour = "firebrick", size=3),
  panel.grid.minor = element_line(colour = "blue", size=1))

Plot margin and background

gg3 + theme(plot.background=element_rect(fill="yellowgreen"), plot.margin = unit(c(2, 4, 1, 3), "cm")) # top, right, bottom, left

Colors

The whole list of colors are displayed at your R console in the color() function. Here are few of my suggestions for nice looking colors and backgrounds:

  • steelblue (points and lines)
  • firebrick (point and lines)
  • springgreen (fills)
  • violetred (fills)
  • tomato (fills)
  • skyblue (bg)
  • sienna (points, lines)
  • slateblue (fills)
  • seagreen (points, lines, fills)
  • sandybrown (fills)
  • salmon (fills)
  • saddlebrown (lines)
  • royalblue (fills)
  • orangered (point, lines, fills)
  • olivedrab (points, lines, fills)
  • midnightblue (lines)
  • mediumvioletred (points, lines, fills)
  • maroon (points, lines, fills)
  • limegreen (fills)
  • lawngreen (fills)
  • forestgreen (lines, fills)
  • dodgerblue (fills, bg)
  • dimgray (grids, secondary bg)
  • deeppink (fills)
  • darkred (lines, points)

If you are looking for consistent colors, the RColorBrewer package has predefined color palettes

Legend

Hide legend

gg3 + theme(legend.position="none")  # hides the legend

Change legend title

gg3 + scale_color_discrete(name="")  # Remove legend title (method1)
p1 <- gg3 + theme(legend.title=element_blank())  # Remove legend title (method)
p2 <- gg3 + scale_color_discrete(name="Diamonds")  # Change legend title
library(gridExtra)
grid.arrange(p1, p2, ncol=2)  # arrange

Change legend and point color

gg3 + scale_colour_manual(name='Legend', values=c('D'='grey', 'E'='red', 'F'='blue', 'G'='yellow', 'H'='black', 'I'='green', 'J'='firebrick'))

Change legend position

Outside plot

p1 <- gg3 + theme(legend.position="top")  # top / bottom / left / right

Inside plot

p2 <- gg3 + theme(legend.justification=c(1,0), legend.position=c(1,0))  # legend justification is the anchor point on the legend, considering the bottom left of legend as (0,0)
gridExtra::grid.arrange(p1, p2, ncol=2)

Change order of legend items

df$newLegendColumn <- factor(df$legendcolumn, levels=c(new_order_of_legend_items), ordered = TRUE) 

Create a new factor variable used in the legend, ordered as you need. Then use this variable instead in the plot.

Legend title, text, box, symbol

  • legend.title - Change legend title
  • legend.text - Change legend text
  • legend.key - Change legend box
  • guides - Change legend symbols
gg3 + theme(legend.title = element_text(size=20, color = "firebrick"), legend.text = element_text(size=15), legend.key=element_rect(fill='steelblue')) + guides(colour = guide_legend(override.aes = list(size=2, shape=4, stroke=2)))  # legend title color and size, box color, symbol color, size and shape.

Plot text and annotation

Add text in chart

#> Not Run: gg + geom_text(aes(xcol, ycol, label=round(labelCol), size=3))  # general format
gg + geom_text(aes(label=color, color=color), size=4) 

Annotation

#> gg3 + annotate("mytext", x=xpos, y=ypos, label="My text")  # Not run: General Format
library(grid)
my_grob = grobTree(textGrob("My Custom Text", x=0.8, y=0.2, gp=gpar(col="firebrick", fontsize=25, fontface="bold")))
gg3 + annotation_custom(my_grob)

Multiple plots

Multiple chart panels

p1 <- gg1 + facet_grid(color ~ cut)  # arrange in a grid. More space for plots.

Free X and Y axis scales

By setting scales='free', the scales of both X and Y axis is freed. Use scales='free_x' to free only X-axis and scales='free_y' to free only Y-axis.

p2 <- gg1 + facet_wrap(color ~ cut, scales="free")  # free the x and y axis scales.

Arrange multiple plots

library(gridExtra)
grid.arrange(p1, p2, ncol=2)

Geom layers

Add smoothing line

gg3 + geom_smooth(aes(color=color))  # method could be - 'lm', 'loess', 'gam'

Add horizontal / vertical line

p1 <- gg3 + geom_hline(yintercept=5000, size=2, linetype="dotted", color="blue") # linetypes: solid, dashed, dotted, dotdash, longdash and twodash
p2 <- gg3 + geom_vline(xintercept=4, size=2, color="firebrick")
p3 <- gg3 + geom_segment(aes(x=4, y=5000, xend=4, yend=10000, size=2, lineend="round"))
p4 <- gg3 + geom_segment(aes(x=carat, y=price, xend=carat, yend=price-500, color=color), size=2) + coord_cartesian(xlim=c(3, 5))  # x, y: start points. xend, yend: end points
gridExtra::grid.arrange(p1,p2,p3,p4, ncol=2)

Add bar chart

# Frequency bar chart: Specify only X axis.
gg <- ggplot(mtcars, aes(x=cyl))
gg + geom_bar()  # frequency table

gg <- ggplot(mtcars, aes(x=cyl))
p1 <- gg + geom_bar(position="dodge", aes(fill=factor(vs)))  # side-by-side
p2 <- gg + geom_bar(aes(fill=factor(vs)))  # stacked
gridExtra::grid.arrange(p1, p2, ncol=2)

# Absolute bar chart: Specify both X adn Y axis. Set stat="identity"
df <- aggregate(mtcars$mpg, by=list(mtcars$cyl), FUN=mean)  # mean of mpg for every 'cyl'
names(df) <- c("cyl", "mpg")
head(df)
#>   cyl    mpg
#> 1   4  26.66
#> 2   6  19.74
#> 3   8  15.10

gg_bar <- ggplot(df, aes(x=cyl, y=mpg)) + geom_bar(stat = "identity")  # Y axis is explicit. 'stat=identity'
print(gg_bar)

Distinct color for bars

gg_bar <- ggplot(df, aes(x=cyl, y=mpg)) + geom_bar(stat = "identity", aes(fill=cyl))
print(gg_bar)

Change color and width of bars

df$cyl <- as.factor(df$cyl)
gg_bar <- ggplot(df, aes(x=cyl, y=mpg)) + geom_bar(stat = "identity", aes(fill=cyl), width = 0.25)
gg_bar + scale_fill_manual(values=c("4"="steelblue", "6"="firebrick", "8"="darkgreen"))

Change color palette

library(RColorBrewer)
display.brewer.all(n=20, exact.n=FALSE)  # display available color palettes
ggplot(mtcars, aes(x=cyl, y=carb, fill=factor(cyl))) + geom_bar(stat="identity") + scale_fill_brewer(palette="Reds")  # "Reds" is palette name

Line chart

# Method 1:
gg <- ggplot(economics, aes(x=date))  # setup
gg + geom_line(aes(y=psavert), size=2, color="firebrick") + geom_line(aes(y=uempmed), size=1, color="steelblue", linetype="twodash")  # No legend
# available linetypes: solid, dashed, dotted, dotdash, longdash and twodash

# Method 2:
library(reshape2)
df_melt <- melt(economics[, c("date", "psavert", "uempmed")], id="date")  # melt by date. 
gg <- ggplot(df_melt, aes(x=date))  # setup
gg + geom_line(aes(y=value, color=variable), size=1) + scale_color_discrete(name="Legend")  # gets legend.

Line chart from timeseries

# One step method.
library(ggfortify)
autoplot(AirPassengers, size=2) + labs(title="AirPassengers")

Ribbons

Filled time series can be plotted using geom_ribbon(). It takes two compulsory arguments ymin and ymax.

# Prepare the dataframe
st_year <- start(AirPassengers)[1]
st_month <- "01"
st_date <- as.Date(paste(st_year, st_month, "01", sep="-"))
dates <- seq.Date(st_date, length=length(AirPassengers), by="month")
df <- data.frame(dates, AirPassengers, AirPassengers/2)
head(df)
#>        dates AirPassengers AirPassengers.2
#> 1 1949-01-01           112            56.0
#> 2 1949-02-01           118            59.0
#> 3 1949-03-01           132            66.0
#> 4 1949-04-01           129            64.5
#> 5 1949-05-01           121            60.5
#> 6 1949-06-01           135            67.5
# Plot ribbon with ymin=0
gg <- ggplot(df, aes(x=dates)) + labs(title="AirPassengers") + theme(plot.title=element_text(size=30), axis.title.x=element_text(size=20), axis.text.x=element_text(size=15))
gg + geom_ribbon(aes(ymin=0, ymax=AirPassengers)) + geom_ribbon(aes(ymin=0, ymax=AirPassengers.2), fill="green")

gg + geom_ribbon(aes(ymin=AirPassengers-20, ymax=AirPassengers+20)) + geom_ribbon(aes(ymin=AirPassengers.2-20, ymax=AirPassengers.2+20), fill="green")

Area

geom_area is similar to geom_ribbon, except that the ymin is set to 0. If you want to make overlapping area plot, use the alpha aesthetic to make the top layer translucent.

# Method1: Non-Overlapping Area
df <- reshape2::melt(economics[, c("date", "psavert", "uempmed")], id="date")
head(df, 3)
#>         date variable value
#> 1 1967-07-01  psavert  12.5
#> 2 1967-08-01  psavert  12.5
#> 3 1967-09-01  psavert  11.7
p1 <- ggplot(df, aes(x=date)) + geom_area(aes(y=value, fill=variable)) + labs(title="Non-Overlapping - psavert and uempmed")

# Method2: Overlapping Area
p2 <- ggplot(economics, aes(x=date)) + geom_area(aes(y=psavert), fill="yellowgreen", color="yellowgreen") + geom_area(aes(y=uempmed), fill="dodgerblue", alpha=0.7, linetype="dotted") + labs(title="Overlapping - psavert and uempmed")
gridExtra::grid.arrange(p1, p2, ncol=2)

Boxplot and Violin

The oulier points are controlled by the following aesthetics: * outlier.shape * outlier.stroke * outlier.size * outlier.colour

If the notch is turned on (by setting it TRUE), the below boxplot is produced. Else, you would get the standard rectangular boxplots.

p1 <- ggplot(mtcars, aes(factor(cyl), mpg)) + geom_boxplot(aes(fill = factor(cyl)), width=0.5, outlier.colour = "dodgerblue", outlier.size = 4, outlier.shape = 16, outlier.stroke = 2, notch=T) + labs(title="Box plot")  # boxplot
p2 <- ggplot(mtcars, aes(factor(cyl), mpg)) + geom_violin(aes(fill = factor(cyl)), width=0.5, trim=F) + labs(title="Violin plot (untrimmed)")  # violin plot
gridExtra::grid.arrange(p1, p2, ncol=2)

Density

ggplot(mtcars, aes(mpg)) + geom_density(aes(fill = factor(cyl)), size=2) + labs(title="Density plot")  # Density plot

Tiles

corr <- round(cor(mtcars), 2)
df <- reshape2::melt(corr)
gg <- ggplot(df, aes(x=Var1, y=Var2, fill=value, label=value)) + geom_tile() + theme_bw() + geom_text(aes(label=value, size=value), color="white") + labs(title="mtcars - Correlation plot") + theme(text=element_text(size=20), legend.position="none")

library(RColorBrewer)
p2 <- gg + scale_fill_distiller(palette="Reds")
p3 <- gg + scale_fill_gradient2()
gridExtra::grid.arrange(gg, p2, p3, ncol=3)