R for Genomics Exercises: 15 Practice Problems
Fifteen practice problems for genomics in R using Bioconductor: ranges, sequences, RNA-seq counts, differential expression. Hidden solutions.
library(rtracklayer)
# library(Biostrings); library(GenomicRanges); library(DESeq2); library(edgeR)
Exercise 1: DNA string
Difficulty: Beginner.
Show solution
# Biostrings::DNAString("ACGTACGT")
Exercise 2: Reverse complement
Difficulty: Beginner.
Show solution
# Biostrings::reverseComplement(Biostrings::DNAString("ACGT"))
Exercise 3: GC content
Difficulty: Intermediate.
Show solution
# s <- Biostrings::DNAString("ACGTACGTGG")
# sum(letterFrequency(s, c("G","C"))) / length(s)
Exercise 4: Build GRanges
Difficulty: Intermediate.
Show solution
# GenomicRanges::GRanges(seqnames = "chr1",
# ranges = IRanges::IRanges(start = c(100, 200), end = c(150, 250)))
Exercise 5: Find overlapping ranges
Difficulty: Advanced.
Show solution
# gr1 <- GRanges("chr1", IRanges(100, 200))
# gr2 <- GRanges("chr1", IRanges(150, 250))
# findOverlaps(gr1, gr2)
Exercise 6: Subset to a chromosome
Difficulty: Intermediate.
Show solution
# gr[seqnames(gr) == "chr1"]
Exercise 7: Read a FASTA
Difficulty: Intermediate.
Show solution
# Biostrings::readDNAStringSet("seqs.fasta")
Exercise 8: Read RNA-seq counts
Difficulty: Intermediate.
Show solution
# counts <- read.delim("counts.tsv", row.names = 1)
# dim(counts)
Exercise 9: DESeq2 design
Difficulty: Advanced.
Show solution
# coldata <- data.frame(condition = c("ctrl","ctrl","treat","treat"))
# dds <- DESeq2::DESeqDataSetFromMatrix(countData = counts, colData = coldata,
# design = ~ condition)
Exercise 10: Run DESeq2
Difficulty: Advanced.
Show solution
# dds <- DESeq2::DESeq(dds)
# res <- DESeq2::results(dds)
# head(res[order(res$padj), ])
Exercise 11: Volcano plot
Difficulty: Advanced.
Show solution
# library(ggplot2)
# res_df <- as.data.frame(res); res_df$sig <- res_df$padj < 0.05
# ggplot(res_df, aes(log2FoldChange, -log10(padj), color = sig)) + geom_point()
Exercise 12: edgeR alternative
Difficulty: Advanced.
Show solution
# y <- edgeR::DGEList(counts = counts, group = c("ctrl","ctrl","treat","treat"))
# y <- edgeR::calcNormFactors(y)
# y <- edgeR::estimateDisp(y)
# fit <- edgeR::glmQLFit(y, design)
Exercise 13: GO enrichment (concept)
Difficulty: Advanced.
Show solution
# clusterProfiler::enrichGO(gene = up_genes, OrgDb = org.Hs.eg.db,
# ont = "BP", pAdjustMethod = "BH")
Exercise 14: Save GRanges to BED
Difficulty: Intermediate.
Show solution
# rtracklayer::export.bed(gr, "out.bed")
Exercise 15: Annotate genes near peaks
Difficulty: Advanced.
Show solution
# ChIPseeker::annotatePeak(peaks, tssRegion = c(-2000, 2000), TxDb = txdb)
What to do next
- R-for-Biostatistics-Exercises (shipped), clinical stats.
- Linear-Regression-Exercises (shipped), model expression vs phenotype.