Microbiome analysis in R - Differential Analysis

Author

Max Qiu, Chia Sin Liew, Jean-Jack Riethoven,
Binformatics Core Research Facility (BCRF)
Nebraska Center for Biotechnology,
University of Nebraska-Lincoln

Published

October 16, 2023

Note

For a free consultation, please visit our consultation scheduler!

Workshop Overview

This installment of the microbiome analysis is differential (abundance) analysis. For this session, we will be using the full phyloseq object from Salomon, JD et al., 2023 so we have more variable to work with. The data has two independent variables, group (CNT and CPB representing control samples and cardiopulmonary bypass samples) and time (pre and post representing pre-operation and post-operation time points).

Getting Ready

Load R packages we will be using for today’s workshop.

suppressMessages({
        library("tidyverse")
        library("janitor")
        library("kableExtra")
        library("phyloseq")
        library("vegan")
        library("microbiome")
        library("ALDEx2")
        library("DESeq2")
        #library("ANCOMBC")
        library("corncob")
        library('VennDiagram')
})

Copy all needed files to $WORK, create the 16S_Workshop directory if it does not exist, and set it as working directory using commands below.

system("cp -Ru /work/riethoven/jeanjack/16S_Workshop $WORK/", intern=TRUE)
setwd(file.path(Sys.getenv("WORK"), "16S_Workshop"))

Load the phyloseq object constructed from whole dataset of Salomon, JD et al., 2023.

load("./intermediate/SalomonJD_et_al_covariates.rda")

Inspect objects loaded into your environment.

ls()

[1] "create_dt" "ps"

ps

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 4578 taxa and 24 samples ]
sample_data() Sample Data:       [ 24 samples by 17 sample variables ]
tax_table()   Taxonomy Table:    [ 4578 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 4578 tips and 4576 internal nodes ]
refseq()      DNAStringSet:      [ 4578 reference sequences ]

To rearrange the levels of factor or to identify which level of a factor is reference

Note

Levels of a factor are ordered alphabetically by default, which means for group factor with two levels CNT and CPB, CNT is default reference level; for time factor with two levels pre and post, post is default reference level.

sample_meta = sample_data(ps) %>%
        dplyr::as_tibble() %>%
        dplyr::mutate(time = factor(time)) %>% # this step make `time` a factor so we can rearrange its levels
        within(., time <- relevel(time, ref = "pre")) %>% # this step set `pre` as ref level
        as.data.frame()

rownames(sample_meta) = sample_meta$sample 
sample_data(ps) = data.frame(sample_meta)

Inspect and Filter `phyloseq` object

A few screening and filtering steps before final phyloseq object is ready for abundance visualization and diversity analysis.

Step 1, remove Archaea, unknowns, chloroplasts;

ps.clean <- subset_taxa(ps, Kingdom == "Bacteria") %>%
        subset_taxa(!is.na(Phylum)) %>%
        subset_taxa(!Class %in% c("Chloroplast")) %>%
        subset_taxa(!Family %in% c("mitochondria"))
ps.clean

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 4541 taxa and 24 samples ]
sample_data() Sample Data:       [ 24 samples by 17 sample variables ]
tax_table()   Taxonomy Table:    [ 4541 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 4541 tips and 4539 internal nodes ]
refseq()      DNAStringSet:      [ 4541 reference sequences ]

Note

We are only interested in bacteria. Removed 37 taxa.

Step 2, filter taxa (ASV) with low abundance (< 2);

ps.clean.p0 <- filter_taxa(ps.clean, function (x) {sum(x > 0) >= 2}, prune=TRUE)
ps.clean.p0

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 3564 taxa and 24 samples ]
sample_data() Sample Data:       [ 24 samples by 17 sample variables ]
tax_table()   Taxonomy Table:    [ 3564 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 3564 tips and 3562 internal nodes ]
refseq()      DNAStringSet:      [ 3564 reference sequences ]

Note

Abundance is the sum of the number of times an ASV is observed across all samples. Removed 977 taxa.

Complete metadata are shown below:

sample_data(ps.clean.p0)

Differential (Abundance) Analysis

Differential analysis is a widely used approach to identify biomarkers centers at statistics. Differential abundance analysis aims to find the differences in the abundance of each taxa between two classes of subjects or samples, assigning a significance value to each comparison. Before we dive into different type of tools with an application in microbiome DA, we want to dig into the statistical characteristics of microbiome data, what sets this type of data apart from other data.

High-dimensional
Sparse (zero-inflated)
Over-dispersion
Compositional

To date, a number of methods/tools have been developed for microbiome marker discovery, e.g. simple statistical analysis methods stats (base R) and STAMP (Parks et al. 2014), RNA-seq based methods such as edgeR (Robinson, McCarthy, and Smyth 2010), limma-Voom (Law et al. 2014) and DESeq2 (Love, Huber, and Anders 2014), metagenomic based methods such as LEfSe (Segata et al. 2011), metagenomeSeq (Paulson et al. 2013), ALDEx2 (Gloor et al. 2013), ANCOMBC (Lin and Peddada 2020), and corncob (Martin, Witten, and Willis 2020). However, all of these tools have its own advantages and disadvantages, and none of them is considered standard or universal, and none of them provides solution for all above challenges. Moreover, the programs/softwares for different DA methods may be developed using different programming languages, even in different operating systems.

A summary of available methods for microbiome differential abundance analysis is below

DA methods with metagenomics application
Method (package)	Short description	Test	Normalization / Transformation	Suggested input	Output	Application
basic (stats)	Simple Student's t and Wilcox tests are used to assess differences between groups	test = 't', 'wilcox'	No normalization	all types	p-values and statistics	General
edgeR (edgeR)	A Negative Binomial (NB) generalized linear model is used to describe the counts	Empirical Bayes + moderated t, robust estimation of priors (if robust = TRUE), with or without weights (generated by weights_ZINB)	norm = 'TMM', 'TMMwsp', 'RLE', 'upperquartile', 'posupperquartile', 'none' (produced by norm_edgeR)	raw counts with edgeR normalization factors	p-values and statistics	RNA-Seq
limma (limma)	A linear model of the log2-transformed CPMs with weights is used to describe differences between groups	Empirical Bayes + moderated t, with or without weights (generated by weights_ZINB)	norm = 'TMM', 'TMMwsp', 'RLE', 'upperquartile', 'posupperquartile', 'none' (produced by norm_edgeR)	raw counts with edgeR normalization factors	p-values and statistics	RNA-Seq and Microarray
DESeq2 (DESeq2)	A Negative Binomial (NB) generalized linear model is used to describe the counts	Empirical Bayes + LRT, with or without weights (generated by weights_ZINB)	norm = 'ratio', 'poscounts', 'iterate' (produced by norm_DESeq2)	raw counts with DESeq2 size factors	p-values and statistics	RNA-Seq
metagenomeSeq (metagenomeSeq)	Zero-Inflated Gaussian (ZIG) mixture model or Zero-Inflated Log-Gaussian (ZILG) mixture model are used to describe the counts	model = 'fitZig', 'fitFeatureModel'	norm = 'CSS' (produced by norm_CSS)	raw counts with CSS normalization factors	p-values and statistics	Microbiome
corncob (corncob)	A Beta-Binomial regression model is used to descrive the relative counts	test = 'LRT', 'Wald' with (if boot = TRUE) or without bootstrap	Automatically transforms raw counts into relative abundances (like using TSS)	raw counts	p-values and statistics	Microbiome
ALDEx2 (ALDEx2)	Compositional approach - Monte-Carlo sampling from a Dirichlet distribution to estimate the real relative abundances and CLR-like transformation to assess differences between groups	test = 't', 'wilcox', 'kw', 'ANOVA', 'glm'	denom = 'all', 'iqlr', 'zero', 'lvha', 'median', or decided by the user. With test = 'glm', denom = 'all'	raw counts	p-values and statistics	Microbiome
ANCOM (ANCOMBC)	Compositional approach - Analysis of microbiome compositions with or without "sampling fraction" bias correction	ANCOM-II ANOVA models for log-ratios (if BC = FALSE) or linear model with offsets for log-counts (if BC = TRUE)	Automatic in-method transformation and zero types imputation	raw counts	p-values and statistics if BC = TRUE, only statistics otherwise	Microbiome
mixMC (mixOmics)	Compositional and multivariate approach - Uses a sparse Partial Least Squares Discriminant Analysis (sPLS-DA) to identify biomarkers.	sPLS-DA tuned with leave-one-out cross validation	CLR	raw counts	Stability of the features and their relevance	Omics-data

DA with ANCOMBC, corncob and DESeq2

Here we demonstrate 2 metagenomic based DA methods, ANCOMBC and corncob, which were developed specifically for microbiome data (contain many more zeros than RNA-seq data), and 1 RNA-Seq based DA method, DESeq2. Features were considered differentially abundant if their adjusted P values were lower than 0.05.

For the sake of time, we will agglomerate ps.clean.p0 to genus level, ps.clean.genus.

# Agglomerate to genus-level and rename
ps.clean.genus = phyloseq::tax_glom(ps.clean.p0, taxrank = rank_names(ps.clean.p0)[6])
phyloseq::taxa_names(ps.clean.genus) = phyloseq::tax_table(ps.clean.genus)[,"Genus"]
otu_table(ps.clean.genus)[1:5,1:5]

OTU Table:          [5 taxa and 5 samples]
                     taxa are columns
       Lachnospiraceae UCG-004 28-4 Lachnospiraceae AC2044 group
12post                       9    0                          120
12pre                       34    0                          152
14post                      22    0                          650
14pre                       12    0                          118
20post                      33    0                            1
       Lachnospiraceae NK4A136 group Lachnoclostridium
12post                           261                40
12pre                           3962               169
14post                          2109               228
14pre                            569                91
20post                            54               180

Note

Formula/Model: how the microbial abundances for each taxon depend on the variables in metadata.

model = "group + time"
covariate = str_split(model, pattern = "\\+") %>% unlist() %>% str_trim()
var1 = covariate[1]
var2 = covariate[2]
formula = as.formula(paste("", paste(covariate, collapse = " + "), sep = " ~ " ))
ps.da = ps.clean.genus

Some models developed specifically for RNA-Seq data have been proposed for metagenomic differential analysis. Three popular methods, including DESeq2, edgeR, and limma, have been implemented to apply to metagenomic DA. Here we demonstrate applying DESeq2 method as an example.

The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models. DESeq2 is normally applied to RNA-Seq transcriptomics data to identify significantly differentiated genes.

DESeq2 has an official extension within the phyloseq package and an accompanying vignette, which you can find here and here.

Import data with phyloseq, convert to DESeq2

The following two lines actually do all the complicated DESeq2 work. The function phyloseq_to_deseq2 converts your phyloseq-format microbiome data into a DESeqDataSet with dispersions estimated, using the experimental design formula, also shown (the formula). The DESeq function does the rest of the testing, in this case with default testing framework, but you can actually use alternatives.

dds_init <- phyloseq_to_deseq2(ps.da, formula )
dds = DESeq(dds_init)

Note

The default multiple-inference correction is Benjamini-Hochberg, and occurs within the DESeq function.

All Taxa

The following results function call creates a table of the results of the tests. Very fast. The hard work was already stored with the rest of the DESeq2-related data in our dds object (see above).

res_var1 = DESeq2::results(dds, name = resultsNames(dds)[2] )

df_var1 <- as.data.frame(res_var1) %>%
  rownames_to_column(var = "taxa")

Significant Taxa

Filter by the adjusted p-value padj, add a new column to record upregulation or downregulation based on log2FoldChange.

df_var1_filt <- df_var1 %>%
        dplyr::filter(padj < 0.05 ) %>%
        mutate(status = case_when(log2FoldChange < - 0 ~ "Down", 
                            log2FoldChange > 0 ~ "Up") ) %>%
        column_to_rownames(., var = "taxa")

Extend the significant taxa table with taxa annotation.

# attach taxa info at the end
sigtab_var1 <- cbind(df_var1_filt, 
                     tax_table(ps.da)[rownames(df_var1_filt), ] )

Visualizations

ggplot(sigtab_var1, aes(y = Genus, x = log2FoldChange, color = Phylum)) +
    geom_vline(xintercept = 0.0, color = "gray", size = 0.5) +
    geom_point(size=2)

Exercise: get significantly changed taxa between time from dds

All Taxa

res_var2 = DESeq2::results(dds, name = resultsNames(dds)[3] )
df_var2 <- as.data.frame(res_var2) %>%
  rownames_to_column(var = "taxa")

Significant Taxa

df_var2_filt <- df_var2 %>%
        dplyr::filter(padj < 0.05 ) %>%
        mutate(status = case_when(log2FoldChange < - 0 ~"Down", 
                            log2FoldChange > 0 ~ "Up") ) %>%
        column_to_rownames(., var = "taxa")

# not run
# attach taxa info at the end
sigtab_var2 <- cbind(df_var2_filt, 
                     tax_table(ps.da)[rownames(df_var2_filt), ] )
        
sigtab_var2

Visualizations

# not run
ggplot(sigtab_var2, aes(y = Order, x = log2FoldChange, color = Phylum)) +
        geom_vline(xintercept = 0.0, color = "gray", size = 0.5) +
        geom_point(size=2)

corncob uses abundance tables and sample data to build individual taxon model for differential abundance and differential variability. This tool uses a beta-binomial based regression model which allows for the relative abundance of a taxon to be associated with covariates of interest. Unlike existing models, corncob allows for the over-dispersion in the taxon’s counts to be associated with covariates of interest as well. The package provides tests not only for differential relative abundance, but also for differential variability. In this analysis, we use corncob to control over-dispersion of each taxon while testing differential relative abundance associated with covariates of interest, in other words, allows us to control for the effect of the covariates on the variance of the relative abundances.

One downside with using corncob is that its significant_taxa output is for the whole covariate formula; to identify differentiated taxa for each covariate of a combination of covariate formula, one has to identify it from the plot.

Warning

Slow step. Alternatively,

load("./intermediate/corncob_genus.rda")

corncob allows us to test all the taxa in our data in one go by using differentialTest function. It will control the false discrovery rate to account for multiple comparisons. We specify the covariates of our model using formula and phi.fomula. The difference between the formulas and the null version of the formulas are the variables that we test.

set.seed(42)
da_analysis <- differentialTest(formula = formula, # differential abundance
                                phi.formula = formula, # differential variability
                                formula_null = ~ 1,
                                phi.formula_null = formula,
                                test = "Wald", boot = FALSE,
                                data = ps.da,
                                fdr_cutoff = 0.05)

Significant Taxa

We can check the list of differentially-abundant taxa using:

sig_list = da_analysis$significant_taxa
sig_list

 [1] "Eisenbergiella"                    "Lachnospiraceae UCG-003"          
 [3] "Lachnospiraceae NK3A20 group"      "Coprococcus"                      
 [5] "GCA-900066575"                     "[Eubacterium] ventriosum group"   
 [7] "Lachnospiraceae ND3007 group"      "[Bacteroides] pectinophilus group"
 [9] "Sarcina"                           "Pygmaiobacter"                    
[11] "UCG-003"                           "UCG-008"                          
[13] "Selenomonas"                       "Dialister"                        
[15] "Veillonella"                       "CAG-873"                          
[17] "Treponema"                         "Candidatus Saccharimonas"         
[19] "Campylobacter"                     "[Anaerorhabdus] furcosa group"    
[21] "Phascolarctobacterium"

Visualization

We can plot the model coefficients:

plot(da_analysis) + theme(legend.position="none")

set.seed(42)
var2_da_analysis <- differentialTest(formula = ~ time,
                                phi.formula = ~ time,
                                formula_null = ~ 1,
                                phi.formula_null = ~ time,
                                test = "Wald", boot = FALSE,
                                data = ps.da,
                                fdr_cutoff = 0.05)

Significant Taxa

sig_list_var2 = var2_da_analysis$significant_taxa
sig_list_var2

character(0)

Visualization

# not run
plot(var2_da_analysis) + theme(legend.position="none")

Exercise: do DA using corncob with group and get significantly changed taxa

set.seed(42)
var1_da_analysis <- differentialTest(formula = ~ group,
                                phi.formula = ~ group,
                                formula_null = ~ 1,
                                phi.formula_null = ~ group,
                                test = "Wald", boot = FALSE,
                                data = ps.da,
                                fdr_cutoff = 0.05)

Significant Taxa

sig_list_var1 = var1_da_analysis$significant_taxa
sig_list_var1

 [1] "Eisenbergiella"                    "Lachnospiraceae UCG-003"          
 [3] "Lachnospiraceae NK3A20 group"      "GCA-900066575"                    
 [5] "[Eubacterium] ventriosum group"    "Lachnospiraceae ND3007 group"     
 [7] "[Bacteroides] pectinophilus group" "Pygmaiobacter"                    
 [9] "UCG-003"                           "UCG-005"                          
[11] "UCG-008"                           "Monoglobus"                       
[13] "Anaerovibrio"                      "Megamonas"                        
[15] "Selenomonas"                       "Dialister"                        
[17] "Veillonella"                       "CAG-873"                          
[19] "Pyramidobacter"                    "Treponema"                        
[21] "Candidatus Saccharimonas"          "Campylobacter"                    
[23] "[Anaerorhabdus] furcosa group"     "Holdemania"

Visualization

plot(var1_da_analysis) + theme(legend.position="none")

Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC) (Lin and Peddada 2020) is a methodology of differential abundance (DA) analysis for microbial absolute abundance data. ANCOM-BC estimates the unknown sampling fractions, corrects the bias induced by their differences among samples, and identifies taxa that are differentially abundant with changes in the covariate of interest (e.g. the group effect). Result from the ANCOM-BC log-linear model to determine taxa that are differentially abundant according to the covariate of interest.

library(ANCOMBC)

out = ancombc(phyloseq = ps.da, formula = model, p_adj_method = "fdr", group = "group", global = TRUE )

res_ancombc = out$res

Primary results contains: 1) log fold changes; 2) standard errors; 3) test statistics; 4) p-values; 5) adjusted p-values; 6) indicators whether the taxon is differentially abundant (TRUE) or not (FALSE).

Result table

tab_diff = res_ancombc$diff_abn

Estimated LFC and SE

df_lfc = data.frame(res_ancombc$lfc[, -1] * res_ancombc$diff_abn[, -1], check.names = FALSE) %>% 
        mutate(taxon_id = res_ancombc$diff_abn$taxon) %>%
        dplyr::select(taxon_id, everything())

df_se = data.frame(res_ancombc$se[, -1] * res_ancombc$diff_abn[, -1], check.names = FALSE) %>% 
        mutate(taxon_id = res_ancombc$diff_abn$taxon) %>%
        dplyr::select(taxon_id, everything())

colnames(df_se)[-1] = paste0(colnames(df_se)[-1], "SE")

Significant Taxa

var2_diff = tab_diff %>% 
  column_to_rownames(., var = "taxon") %>%
  dplyr::filter(., timepost == TRUE )
rownames(var2_diff)

character(0)

Visualizations

# not run
df_fig_var2 = df_lfc %>% 
                left_join(df_se, by = "taxon_id") %>%
                column_to_rownames(., var = "taxon_id") %>%
                dplyr::select(., starts_with(var2) ) %>%
                dplyr::filter( timepost != 0 ) %>% 
                arrange(desc( timepost )) %>%
                rownames_to_column(., var = "taxon_id") %>%
                mutate(direct = ifelse( timepost > 0, "Positive LFC", "Negative LFC"))
        
df_fig_var2$taxon_id = factor(df_fig_var2$taxon_id, levels = df_fig_var2$taxon_id)

df_fig_var2

# not run
ggplot(data = df_fig_var2, aes(x = taxon_id, y = timepost, fill = direct, color = direct)) +
    geom_bar(stat = "identity", width = 0.7, position = position_dodge(width = 0.4)) + 
    geom_errorbar(aes(ymin = timepost - timepostSE , ymax = timepost + timepostSE ), width = 0.2, position = position_dodge(0.05), color = "black") + 
    labs(x = NULL, y = "Log fold change", title = paste0("Waterfall Plot for ", var2, " Effect") ) + 
    theme_bw() + 
    theme(legend.position = "none", plot.title = element_text(hjust = 0.5), panel.grid.minor.y = element_blank(),axis.text.x = element_text(angle = 60, hjust = 1))

Exercise: get significantly changede taxa between group from tab_diff

Significant Taxa

var1_diff = tab_diff %>% 
        column_to_rownames(., var = "taxon") %>%
        dplyr::filter(., groupCPB == TRUE )
rownames(var1_diff)

 [1] "Lachnospiraceae AC2044 group"    "Eisenbergiella"                 
 [3] "Lachnospiraceae NK3A20 group"    "GCA-900066575"                  
 [5] "[Eubacterium] ventriosum group"  "[Eubacterium] ruminantium group"
 [7] "Pygmaiobacter"                   "V9D2013 group"                  
 [9] "Schwartzia"                      "Dialister"                      
[11] "Pyramidobacter"                  "Treponema"                      
[13] "Campylobacter"                   "Fibrobacter"                    
[15] "Terrisporobacter"                "[Anaerorhabdus] furcosa group"  
[17] "Holdemanella"

Visualizations

df_fig_var1 = df_lfc %>% 
                left_join(df_se, by = "taxon_id") %>%
                column_to_rownames(., var = "taxon_id") %>%
                dplyr::select(., starts_with(var1) ) %>%
                dplyr::filter( groupCPB != 0 ) %>% 
                arrange(desc( groupCPB )) %>%
                rownames_to_column(., var = "taxon_id") %>%
                mutate(direct = ifelse( groupCPB > 0, "Positive LFC", "Negative LFC"))

df_fig_var1$taxon_id = factor(df_fig_var1$taxon_id, levels = df_fig_var1$taxon_id)
        
df_fig_var1

ggplot(data = df_fig_var1, aes(x = taxon_id, y = groupCPB, fill = direct, color = direct)) +
    geom_bar(stat = "identity", width = 0.7, position = position_dodge(width = 0.4)) + 
    geom_errorbar(aes(ymin = groupCPB - groupCPBSE , ymax = groupCPB + groupCPBSE ), width = 0.2, position = position_dodge(0.05), color = "black") + 
    labs(x = NULL, y = "Log fold change", title = paste0("Waterfall Plot for ", var1, " Effect") ) + 
    theme_bw() + 
    theme(legend.position = "none", plot.title = element_text(hjust = 0.5), panel.grid.minor.y = element_blank(),axis.text.x = element_text(angle = 60, hjust = 1))

Compare and Summarize

# special steps for corncob; get estimates from each model
x = var1_da_analysis
all_taxa = phyloseq::taxa_names(ps.da) # genus names
total_var_count <- length(all_taxa) # number of genus

# create new empty data.frame with 4 cols
df <- as.data.frame(matrix(NA, nrow = total_var_count, ncol = 4))
# set names of 4 cols
colnames(df) <- c("x", "xmin", "xmax", "taxa")
qval <- stats::qnorm(.975)
restricts_mu <- attr(x$restrictions_DA, "index")

count <- 1
for (i in 1:length(x$all_models)) {
        
        # Below from print_summary_bbdml, just to get coefficient names
        tmp <- x$all_models[[i]]
        coefs.mu <- tmp$coefficients[1:tmp$np.mu,, drop = FALSE]
        rownames(coefs.mu) <- paste0(substring(rownames(coefs.mu), 4), "\nDifferential Abundance")
        coefs.mu <- coefs.mu[restricts_mu,, drop = FALSE]
        
        df[count, 1:3] <- c(coefs.mu[1, 1], coefs.mu[1, 1] - qval * coefs.mu[1, 2],
                            coefs.mu[1, 1] + qval * coefs.mu[1, 2])
        df[count, 4] <- all_taxa[i]
        count <- count + 1
}

allres <- bind_rows(
        data.frame(estimate = res_var1$log2FoldChange, 
                   p.value = res_var1$padj, 
                   otu = rownames(res_var1),
                   test = 'DESeq2'),
        data.frame(estimate = df_var1$log2FoldChange,
                   p.value = df_var1$padj,
                   otu = df_var1$taxa,
                   test = "ANCOMBC"),
        data.frame(estimate = df$x,
                   p.value = var1_da_analysis$p_fdr,
                   otu = df$taxa,
                   test = "corncob")
)

ggplot(data = allres, aes(estimate,-log10(p.value), label = otu)) + 
        geom_point() + 
        geom_text(data = allres[allres$p.value< 10e-4 & !is.na(allres$p.value),], 
                  position = position_jitter()) + 
        xlim(-10, 10) +
        facet_wrap(~test,nrow =  1, scales = 'free_x')

Reduce(intersect, list( rownames(var1_diff), sig_list_var1, rownames(sigtab_var1) ))

 [1] "Eisenbergiella"                 "Lachnospiraceae NK3A20 group"  
 [3] "GCA-900066575"                  "[Eubacterium] ventriosum group"
 [5] "Pygmaiobacter"                  "Dialister"                     
 [7] "Pyramidobacter"                 "Treponema"                     
 [9] "Campylobacter"                  "[Anaerorhabdus] furcosa group"

da_venn <- venn.diagram(
  x = list( rownames(var1_diff), sig_list_var1, rownames(sigtab_var1) ),
  category.names = c("ANCOM-BC", "corncob", "DESeq2"),
  filename = NULL,
  fill = c( "#BEBADA", "#FB8072", "#FFFFB3"))

grid::grid.newpage()
grid::grid.draw(da_venn)

References and Further Readings

Xia, Yinglin, Jun Sun, and Ding-Geng Chen. Statistical analysis of microbiome data with R. Springer, 2018.
Microbiome Data Analysis
16S rDNA amplicon sequencing analysis using R
ANCOMBC vignette 1
ANCOMBC vignette 2
corncob vignette
DESeq2 vignette

Reproducibility

R session information:

sessionInfo()

R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] doRNG_1.8.6                 rngtools_1.5.2             
 [3] foreach_1.5.2               ANCOMBC_2.0.3              
 [5] VennDiagram_1.7.3           futile.logger_1.4.3        
 [7] corncob_0.3.1               DESeq2_1.38.3              
 [9] SummarizedExperiment_1.28.0 Biobase_2.58.0             
[11] MatrixGenerics_1.10.0       matrixStats_0.63.0         
[13] GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
[15] IRanges_2.32.0              S4Vectors_0.36.2           
[17] BiocGenerics_0.44.0         ALDEx2_1.30.0              
[19] zCompositions_1.4.0-1       truncnorm_1.0-9            
[21] NADA_1.6-1.1                survival_3.5-3             
[23] MASS_7.3-58.2               microbiome_1.20.0          
[25] vegan_2.6-4                 lattice_0.20-45            
[27] permute_0.9-7               phyloseq_1.42.0            
[29] kableExtra_1.3.4            janitor_2.2.0              
[31] lubridate_1.9.2             forcats_1.0.0              
[33] stringr_1.5.0               dplyr_1.1.1                
[35] purrr_1.0.1                 readr_2.1.4                
[37] tidyr_1.3.0                 tibble_3.2.1               
[39] ggplot2_3.4.2               tidyverse_2.0.0            

loaded via a namespace (and not attached):
  [1] estimability_1.4.1             bit64_4.0.5                   
  [3] knitr_1.42                     irlba_2.3.5.1                 
  [5] multcomp_1.4-25                DelayedArray_0.23.2           
  [7] data.table_1.14.8              rpart_4.1.19                  
  [9] KEGGREST_1.38.0                RCurl_1.98-1.12               
 [11] doParallel_1.0.17              generics_0.1.3                
 [13] ScaledMatrix_1.6.0             lambda.r_1.2.4                
 [15] TH.data_1.1-2                  RSQLite_2.3.1                 
 [17] proxy_0.4-27                   bit_4.0.5                     
 [19] tzdb_0.4.0                     webshot_0.5.4                 
 [21] xml2_1.3.4                     DirichletMultinomial_1.40.0   
 [23] viridis_0.6.3                  xfun_0.38                     
 [25] hms_1.1.3                      jquerylib_0.1.4               
 [27] evaluate_0.21                  fansi_1.0.4                   
 [29] readxl_1.4.2                   mia_1.6.0                     
 [31] igraph_1.4.2                   DBI_1.1.3                     
 [33] geneplotter_1.76.0             htmlwidgets_1.6.2             
 [35] Rmpfr_0.9-2                    CVXR_1.0-11                   
 [37] ellipsis_0.3.2                 crosstalk_1.2.0               
 [39] backports_1.4.1                energy_1.7-11                 
 [41] annotate_1.76.0                sparseMatrixStats_1.10.0      
 [43] vctrs_0.6.1                    SingleCellExperiment_1.20.1   
 [45] cachem_1.0.8                   withr_2.5.0                   
 [47] checkmate_2.2.0                emmeans_1.8.5                 
 [49] treeio_1.22.0                  MultiAssayExperiment_1.24.0   
 [51] svglite_2.1.1                  cluster_2.1.4                 
 [53] gsl_2.1-8                      ape_5.7-1                     
 [55] lazyeval_0.2.2                 crayon_1.5.2                  
 [57] TreeSummarizedExperiment_2.6.0 pkgconfig_2.0.3               
 [59] labeling_0.4.2                 nlme_3.1-162                  
 [61] vipor_0.4.5                    nnet_7.3-18                   
 [63] rlang_1.1.0                    lifecycle_1.0.3               
 [65] sandwich_3.0-2                 rsvd_1.0.5                    
 [67] cellranger_1.1.0               Matrix_1.5-3                  
 [69] Rhdf5lib_1.20.0                boot_1.3-28.1                 
 [71] zoo_1.8-12                     base64enc_0.1-3               
 [73] beeswarm_0.4.0                 png_0.1-8                     
 [75] viridisLite_0.4.2              rootSolve_1.8.2.3             
 [77] bitops_1.0-7                   rhdf5filters_1.10.1           
 [79] Biostrings_2.66.0              blob_1.2.4                    
 [81] DelayedMatrixStats_1.20.0      decontam_1.18.0               
 [83] DECIPHER_2.26.0                beachmat_2.14.2               
 [85] scales_1.2.1                   memoise_2.0.1                 
 [87] magrittr_2.0.3                 plyr_1.8.8                    
 [89] zlibbioc_1.44.0                compiler_4.2.3                
 [91] RColorBrewer_1.1-3             lme4_1.1-33                   
 [93] snakecase_0.11.0               cli_3.6.1                     
 [95] ade4_1.7-22                    XVector_0.38.0                
 [97] lmerTest_3.1-3                 htmlTable_2.4.1               
 [99] formatR_1.14                   Formula_1.2-5                 
[101] mgcv_1.8-42                    tidyselect_1.2.0              
[103] stringi_1.7.12                 highr_0.10                    
[105] yaml_2.3.7                     BiocSingular_1.14.0           
[107] locfit_1.5-9.8                 ggrepel_0.9.3                 
[109] sass_0.4.6                     tools_4.2.3                   
[111] lmom_2.9                       timechange_0.2.0              
[113] parallel_4.2.3                 rstudioapi_0.14               
[115] foreign_0.8-84                 gridExtra_2.3                 
[117] gld_2.6.6                      farver_2.1.1                  
[119] Rtsne_0.16                     RcppZiggurat_0.1.6            
[121] digest_0.6.31                  Rcpp_1.0.10                   
[123] scuttle_1.8.4                  httr_1.4.6                    
[125] AnnotationDbi_1.60.2           Rdpack_2.4                    
[127] colorspace_2.1-0               rvest_1.0.3                   
[129] XML_3.99-0.14                  splines_4.2.3                 
[131] yulab.utils_0.0.6              tidytree_0.4.2                
[133] expm_0.999-7                   scater_1.26.1                 
[135] multtest_2.54.0                Exact_3.2                     
[137] systemfonts_1.0.4              xtable_1.8-4                  
[139] gmp_0.7-1                      jsonlite_1.8.4                
[141] nloptr_2.0.3                   futile.options_1.0.1          
[143] Rfast_2.0.8                    R6_2.5.1                      
[145] Hmisc_5.1-0                    pillar_1.9.0                  
[147] htmltools_0.5.5                glue_1.6.2                    
[149] fastmap_1.1.1                  minqa_1.2.5                   
[151] DT_0.27                        BiocParallel_1.32.6           
[153] BiocNeighbors_1.16.0           class_7.3-21                  
[155] codetools_0.2-19               mvtnorm_1.1-3                 
[157] utf8_1.2.3                     bslib_0.4.2                   
[159] numDeriv_2016.8-1.1            ggbeeswarm_0.7.2              
[161] DescTools_0.99.48              rmarkdown_2.21                
[163] biomformat_1.26.0              munsell_0.5.0                 
[165] e1071_1.7-13                   rhdf5_2.42.1                  
[167] GenomeInfoDbData_1.2.9         iterators_1.0.14              
[169] reshape2_1.4.4                 gtable_0.3.3                  
[171] rbibutils_2.2.13

Workshop Overview

Getting Ready

Inspect and Filter phyloseq object

Differential (Abundance) Analysis

DA with ANCOMBC, corncob and DESeq2

Import data with phyloseq, convert to DESeq2

Compare and Summarize

References and Further Readings

Reproducibility

Inspect and Filter `phyloseq` object