R for Genomics Exercises: 15 Practice Problems

Fifteen practice problems for genomics in R using Bioconductor: ranges, sequences, RNA-seq counts, differential expression. Hidden solutions.

library(rtracklayer)
# library(Biostrings); library(GenomicRanges); library(DESeq2); library(edgeR)

Exercise 1: DNA string

Difficulty: Beginner.

Show solution
# Biostrings::DNAString("ACGTACGT")

Exercise 2: Reverse complement

Difficulty: Beginner.

Show solution
# Biostrings::reverseComplement(Biostrings::DNAString("ACGT"))

Exercise 3: GC content

Difficulty: Intermediate.

Show solution
# s <- Biostrings::DNAString("ACGTACGTGG")
# sum(letterFrequency(s, c("G","C"))) / length(s)

Exercise 4: Build GRanges

Difficulty: Intermediate.

Show solution
# GenomicRanges::GRanges(seqnames = "chr1",
#                       ranges = IRanges::IRanges(start = c(100, 200), end = c(150, 250)))

Exercise 5: Find overlapping ranges

Difficulty: Advanced.

Show solution
# gr1 <- GRanges("chr1", IRanges(100, 200))
# gr2 <- GRanges("chr1", IRanges(150, 250))
# findOverlaps(gr1, gr2)

Exercise 6: Subset to a chromosome

Difficulty: Intermediate.

Show solution
# gr[seqnames(gr) == "chr1"]

Exercise 7: Read a FASTA

Difficulty: Intermediate.

Show solution
# Biostrings::readDNAStringSet("seqs.fasta")

Exercise 8: Read RNA-seq counts

Difficulty: Intermediate.

Show solution
# counts <- read.delim("counts.tsv", row.names = 1)
# dim(counts)

Exercise 9: DESeq2 design

Difficulty: Advanced.

Show solution
# coldata <- data.frame(condition = c("ctrl","ctrl","treat","treat"))
# dds <- DESeq2::DESeqDataSetFromMatrix(countData = counts, colData = coldata,
#                                       design = ~ condition)

Exercise 10: Run DESeq2

Difficulty: Advanced.

Show solution
# dds <- DESeq2::DESeq(dds)
# res <- DESeq2::results(dds)
# head(res[order(res$padj), ])

Exercise 11: Volcano plot

Difficulty: Advanced.

Show solution
# library(ggplot2)
# res_df <- as.data.frame(res); res_df$sig <- res_df$padj < 0.05
# ggplot(res_df, aes(log2FoldChange, -log10(padj), color = sig)) + geom_point()

Exercise 12: edgeR alternative

Difficulty: Advanced.

Show solution
# y <- edgeR::DGEList(counts = counts, group = c("ctrl","ctrl","treat","treat"))
# y <- edgeR::calcNormFactors(y)
# y <- edgeR::estimateDisp(y)
# fit <- edgeR::glmQLFit(y, design)

Exercise 13: GO enrichment (concept)

Difficulty: Advanced.

Show solution
# clusterProfiler::enrichGO(gene = up_genes, OrgDb = org.Hs.eg.db,
#                          ont = "BP", pAdjustMethod = "BH")

Exercise 14: Save GRanges to BED

Difficulty: Intermediate.

Show solution
# rtracklayer::export.bed(gr, "out.bed")

Exercise 15: Annotate genes near peaks

Difficulty: Advanced.

Show solution
# ChIPseeker::annotatePeak(peaks, tssRegion = c(-2000, 2000), TxDb = txdb)

What to do next

  • R-for-Biostatistics-Exercises (shipped), clinical stats.
  • Linear-Regression-Exercises (shipped), model expression vs phenotype.