Microbiome analysis in R - Part I
For a free consultation, please visit our consultation scheduler!
Workshop and Data Overview
The first installment of the microbiome analysis is to obtain Amplicon Sequence Variants (ASVs), their abundance in each sample, given raw 16S fastq data.
The data we will be using for this workshop is originated from Salomon, JD et al., 2023. “Many animal models have been useful in evaluating cardiopulmonary bypass (CPB) and a variety of cardiovascular, kidney, respiratory and neurologic outcomes. No animal models of CPB, however, have been utilized to evaluate changes to the microbiome … In this article, we used a model of piglet CPB/DHCA to evaluate the microbiome … A total of 12 piglets were used in this study, five in the CPB/DHCA group and seven in the control group receiving mechanical ventilation only. Sequences of 16S rRNA amplicon libraries generated using fecal microbial DNA resulted in a total of 12.6 million reads.”
For the sake of this workshop, we picked a few samples and subsetted the raw fastq files, so the analysis will fit in the time period of this workshop. To read more about this study, please refer to the publication.
In this workshop, we will be following the Bioconductor workflow for microbiome data analysis (Callahan et al., 2016b) using R software.
Getting Ready
Load R packages we will be using for today’s workshop.
suppressMessages({
library("tidyverse")
library("dada2")
library("gridExtra")
})
no_of_cores = 1If you run this code in your own PC, no_of_core is 1 due to forking limitations in Windows. When you run this code on the HCC cluster (or Mac), no_of_core can be adjusted to what you have requested for the job. Additional note: not all Mac configurations support forking within RStudio, especially if you run RStudio within one of the virtualization technologies (e.g. Docker, VirtualBox).
Set your working directory to where your data (fastq files) are located, using combinations of commands listed below.
- Check current working directory
getwd()- Create a directory
dir.create()- Set working directory.
setwd()Load metadata file.
metadata = "sample_meta.csv"
metadata_df = read.csv(metadata, row.names=1)metadata_dfSetting up a metadata file before running the analysis saves time and minimizes the probability of making errors further down the line.
Read forward and reverse fastq filenames from metadata data frame.
fnFs <- metadata_df$fq1
fnRs <- metadata_df$fq2
# Extract sample names from metadata data frame
sample.names <- metadata_df$sampleInspect files and sample names.
fnFs
length(fnFs)
fnRs
length(fnRs)
sample.names