class: center, middle, inverse, title-slide .title[ # What Are Microbiome Data ] .author[ ###
Max Qiu, PhD
Bioinformatician/Computational Biologist
maxqiu@unl.edu
] .institute[ ###
Bioinformatics Core Research Facility, Center for Biotechnology
] .date[ ### 04-19-2023 ] --- class: inverse # Outline * ### Microbiome Data Structure * ### What Microbiome Bioinformatics Analysis .highlight[Can] and .highlight[Cannot] Answer * ### Characteristics of Microbiome Data * ### Challenges of Modeling Microbiome Data --- class: inverse, left, middle # Microbiome data structure - Data matrix - Sample metadata - Feature metadata - Structured as a phylogenetic tree --- ## (Sample-by-feature) Data matrix ```r otu_table(ps)[1:10, 1:10] ``` ``` ## OTU Table: [10 taxa and 10 samples] ## taxa are columns ## ASV_3837 ASV_1432 ASV_3347 ASV_3098 ASV_546 ASV_1097 ASV_2257 ASV_3871 ## 12post 0 0 0 0 13 0 0 0 ## 12pre 0 0 0 0 39 0 0 0 ## 14post 1 1 0 2 13 1 0 0 ## 14pre 0 0 0 0 3 0 0 1 ## 20post 0 2 0 0 64 1 0 0 ## 20pre 0 5 0 0 48 0 0 0 ## 21post 0 43 0 0 2 1 0 0 ## 21pre 0 22 1 0 2 0 0 0 ## 23post 0 0 1 0 31 18 0 0 ## 23pre 0 0 0 0 11 10 1 0 ## ASV_2892 ASV_2761 ## 12post 0 0 ## 12pre 1 0 ## 14post 2 0 ## 14pre 5 0 ## 20post 0 0 ## 20pre 0 0 ## 21post 0 0 ## 21pre 0 0 ## 23post 0 0 ## 23pre 0 0 ``` --- ## Sample metadata ```r sample_data(ps)[1:10,] ``` ``` ## sample fq1 fq2 ## 12post 12post 12post_S19_L001_R1_001.fastq.gz 12post_S19_L001_R2_001.fastq.gz ## 12pre 12pre 12pre_S17_L001_R1_001.fastq.gz 12pre_S17_L001_R2_001.fastq.gz ## 14post 14post 14post_S3_L001_R1_001.fastq.gz 14post_S3_L001_R2_001.fastq.gz ## 14pre 14pre 14pre_S18_L001_R1_001.fastq.gz 14pre_S18_L001_R2_001.fastq.gz ## 20post 20post 20post_S23_L001_R1_001.fastq.gz 20post_S23_L001_R2_001.fastq.gz ## 20pre 20pre 20pre_S14_L001_R1_001.fastq.gz 20pre_S14_L001_R2_001.fastq.gz ## 21post 21post 21post_S13_L001_R1_001.fastq.gz 21post_S13_L001_R2_001.fastq.gz ## 21pre 21pre 21pre_S5_L001_R1_001.fastq.gz 21pre_S5_L001_R2_001.fastq.gz ## 23post 23post 23post_S7_L001_R1_001.fastq.gz 23post_S7_L001_R2_001.fastq.gz ## 23pre 23pre 23pre_S24_L001_R1_001.fastq.gz 23pre_S24_L001_R2_001.fastq.gz ## ID time group ## 12post 12 post CPB ## 12pre 12 pre CPB ## 14post 14 post CPB ## 14pre 14 pre CPB ## 20post 20 post CNT ## 20pre 20 pre CNT ## 21post 21 post CNT ## 21pre 21 pre CNT ## 23post 23 post CPB ## 23pre 23 pre CPB ``` --- ## Feature metadata ```r tax_table(ps)[1:10,] ``` ``` ## Taxonomy Table: [10 taxa by 7 taxonomic ranks]: ## Kingdom Phylum Class Order ## ASV_3837 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_1432 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_3347 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_3098 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_546 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_1097 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_2257 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_3871 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_2892 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## ASV_2761 "Bacteria" "Firmicutes" "Clostridia" "Lachnospirales" ## Family Genus Species ## ASV_3837 "Lachnospiraceae" NA NA ## ASV_1432 "Lachnospiraceae" "Lachnoclostridium" NA ## ASV_3347 "Lachnospiraceae" "Lachnoclostridium" NA ## ASV_3098 "Lachnospiraceae" "Lachnoclostridium" NA ## ASV_546 "Lachnospiraceae" "Lachnoclostridium" NA ## ASV_1097 "Lachnospiraceae" "Lachnoclostridium" NA ## ASV_2257 "Lachnospiraceae" NA NA ## ASV_3871 "Lachnospiraceae" NA NA ## ASV_2892 "Lachnospiraceae" "Lachnoclostridium" NA ## ASV_2761 "Lachnospiraceae" "Lachnospiraceae NK4A136 group" "bacterium" ``` --- ## Structured as a phylogenetic tree <img src="data:image/png;base64,#./img/tree.png" width="90%" style="display: block; margin: auto;" /> .pull-right[ .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) Figure generated using [FigTree](http://tree.bio.ed.ac.uk/software/figtree/). ] ] --- # What Microbiome Bioinformatics Analysis .highlight[Can] Answer ### Who are there - .highlight[Taxonomic composition] (classification and abundance) + Alpha diversity + Beta diversity ### What are they doing - .highlight[Functional composition] ??? We have seen this slides before, this is what the bioinformatics can get you at the end of its pipeline, taxa identification and estimate their abundance. From here, we can do diversity analysis, specifically, we can calculate the alpha diversity for each samples of different metrics, and we can visualize the beta diversity with 2-d plot. --- ## Alpha Diversity .pull-left[ * Diversity at one spot or community + .highlight[Local] diversity + Acts like a .highlight[summary statistic] of a single community * Fundamental questions: + How many species present? (richness) + How many species are truly there? (diversity) + How even are each species relatively to each other? (evenness) ] .pull-right[ Observed OTU, Chao1, Shannon, Simpson, InvSimpson, Fisher, Phylogenetic diversity, ... + .emphasize[Observed vs Estimated] + .emphasize[Non-phylogenetic vs Phylogenetic] + .emphasize[Unweighted vs Weighted] by abundance ] ??? Alpha diversity as one of the basic diversity indices is defined as .highlight[diversity in one spot or sample]. It acts like a .highlight[summary statistic] of a single population, it's .highlight[local]. Alpha diversity is one of the essential concepts in ecology. The fundamental questions encountered by researchers are: ... .highlight[Communities that are numerically dominated by one or a few species exhibit low evenness] while .highlight[communities where abundance is distributed equally among species exhibit high evenness]. There are many alpha diversity metrics, these metrics represent different concepts. --- ## Alpha Diversity ```r adiv[1:20, 1:7] ``` ``` ## Observed Chao1 Shannon Simpson InvSimpson Fisher PD ## X12post 772 1069.439 5.153406 0.9789896 47.59538 203.6945 57.36124 ## X12pre 1414 1733.444 5.545307 0.9902083 102.12695 244.6882 105.36945 ## X14post 1221 1432.757 5.307597 0.9878167 82.07974 202.8327 95.81532 ## X14pre 933 1178.000 5.208907 0.9831334 59.28875 193.2584 71.00369 ## X20post 1329 1741.248 4.979233 0.9792213 48.12623 230.1734 103.31272 ## X20pre 1215 1508.007 5.127905 0.9848628 66.06240 213.5342 98.36759 ## X21post 1408 1837.875 5.409402 0.9861229 72.06100 266.6152 110.18600 ## X21pre 1532 1905.312 5.440965 0.9873182 78.85342 281.8831 118.72087 ## X23post 1622 1903.114 5.145834 0.9713741 34.93343 266.0578 97.02621 ## X23pre 1426 1753.434 4.985194 0.9658489 29.28160 252.2428 101.73798 ## X44post 1637 1939.544 5.806304 0.9866761 75.05291 338.4221 99.57581 ## X44pre 1357 1778.787 5.521896 0.9759116 41.51383 319.3114 115.27071 ## X48post 1379 1708.024 5.580212 0.9911054 112.42722 241.2026 85.74055 ## X48pre 1524 1780.321 5.520602 0.9891785 92.40903 258.6288 127.79839 ## X50post 1567 1882.972 5.471001 0.9866266 74.77548 280.1078 118.23638 ## X50pre 1388 1703.269 5.126916 0.9672926 30.57411 265.0344 113.85581 ## X54post 1304 1608.603 5.290703 0.9889997 90.90663 217.0067 100.64586 ## X54pre 1432 1717.187 5.437186 0.9870839 77.42249 249.1576 112.94817 ## X59post 1726 1981.088 5.439747 0.9877155 81.40321 264.2693 171.56131 ## X59pre 1339 1632.182 5.515967 0.9897586 97.64324 251.2341 104.31080 ``` --- ## Beta Diversity * Community classification (i.e., to differentiate) + leads to .highlight[measure the similarity] between two community samples * "Species turnover" + .highlight[a measure of change] in diversity across environmental gradients + reflects .highlight[species replacement] as one moves across space or time * Elucidate .highlight[how much diversity is unique] to a community, or describe .highlight[how many taxa are shared] between communities. ??? .highlight[One important purpose of microbiome study is to determine whether the microbiome commonities can be classified together or needs to be separated], to differentiate treatment from control, healthy from disease, genetic mutate from wild type, etc. The questions of .highlight[community classification] leads to .highlight[measure the similarity] between two community samples (beta-diversity). The concept of “similarity” or beta-diversity and its measures mainly come from ecology and other fields. Beta diversity was originally defined as .highlight[a measure of change] in diversity across environmental gradients; in other words, it is .highlight[the rate of change in species composition] from one community to another along gradients (Whittaker 1960). Hence, it .highlight[reflects species replacement] as a community moves across space or time (Magurran 2004). Beta diversity is also known as ‘species turnover’. In general, beta diversity evaluates differences between two or more communities (Koleff et al. 2003; Lozupone and Knight 2008), thus allowing us to elucidate how much diversity is .highlight[unique to one community], or describe how many taxa are .highlight[shared between communities]. --- ## Beta Diversity .pull-left[ * Beta diversity is calculated by using a .highlight[similarity or distance] measure to represent the relationships of samples + Jaccard similarity + Bray-Curtis dissimilarity + UniFrac (Unweighted vs Weighted by abundance) ] .pull-right[ ![](data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) ] ??? As beta dversity is a measure of similarity, it is calculated by using a similarity or distance measure/matrix. There are a few popular choices. The key point to .highlight[selection] of the proper measure of beta diversity is based on .highlight[microbiome hypothesis testing] and .highlight[the selection must be tailored to the hypothesis, rather than vice versa]. No single measure is best in all circumstances. This graph shows the similarity or distance matrix between each pair of samples using Unifrac distance matrix. --- ## Beta Diversity .pull-left[ * Beta diversity is calculated by using a .highlight[similarity or distance] measure to represent the relationships of samples + Jaccard similarity + Bray-Curtis dissimilarity + UniFrac (Unweighted vs Weighted by abundance) * Ordination (visualization) + Goal: .highlight[Visualization of similarity among samples] + PCA, PCoA, NMDS ... ] .pull-right[ <img src="data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto 0 auto auto;" /> ] .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) ] ??? After we calculated a distance matrix, what do we do with them? We use them for ordination. The primary aim of ordination is to .highlight[represent multiple samples in a reduced number of] orthogonal (i.e., independent) .highlight[axes]. The importance of ordination axes decreases by order. The first axis of an ordination explains the most variation in the dataset, followed by the second axis, then the third, and so on. The ordination plots are particularly useful for visualizing the similarity among samples (subjects). For example, .highlight[in the context of beta diversity, samples that are closer in ordination space have species assemblages that are more similar to one another than samples that are further apart in ordination space]. PCoA is a flexible ordination technique that allows the user to choose virtually any distance metric (e.g., Jaccard, Bray-Curtis, Euclidean, etc.) while PCA only uses Euclidean distances. --- # What Microbiome Bioinformatics Analysis .highlight[Cannot] Answer .pull-left[ ### Two main themes in the current microbiome studies - .emphasize[To characterize the relationship between microbiome features and biological, genetic, clinical or experimental conditions] - .emphasize[To identify potential biological and environmental factors that are associated with microbiome composition] .highlight[Goal: to understand mechanisms of host genetic and environmental factors that shape microbiome.] ] ??? Up to now, we know what kind of answer bioinformatics can get you. We can get taxanomy composition and estimate taxa abundance. From there, we can calculate alpha diversity of different metrics and we can visualize beta diversity. There are mainly two themes in the current microbiome studies. The goal of these studies is to ... Insights gained from the studies potentially contribute to the development of therapeutic strategies in modulating the microbiome composition in human diseases. None of these questions can be answered just by exploring the microbiome alone (bioinformatics alone). | -- .pull-right[ ![Interactions among environment, microbiome and host](data:image/png;base64,#img/hypothesis_microbiome.jpg) ] ??? What can help you answer these questions are statistics and hypothesis testing, they helps explore the interactions among environment, microbiome and host, which are dynamic and complicated. --- ## Hypotheses .pull-left[ ![Interactions among environment, microbiome and host](data:image/png;base64,#img/hypothesis_microbiome.jpg) `\(H_{0}\)`: there is .highlight[no difference (change) of microbiome composition] in different experimental groups (or factors) or genetic conditions (e.g., health and disease) or with different interventions. ] .pull-right[ * Hypothesis 1 is to test the .highlight[association between microbiome and host]: whether the composition of the microbiome or “dysbiotic” microbiome is linked to the health or disease of host. * Hypothesis 2 is to test .highlight[whether microbiome is associated with environmental or biological covariates], whether environmental factors impact microbiome, or whether an intervention has an effect on a specific microbiome composition (diversity) in health and disease. * Hypothesis 3 is to test the .highlight[association between environment and host]. ] ??? Hypothesis I: For example, in inflammatory bowel diseases (IBD) research, we hypothesize that dysbiosis is associated with the progression of the diseases. Hypothesis II: For example, we can test whether dietary interventions shape gut microbiota, or whether a probiotic intervention impacts the composition of the human microbiota. Hypothesis III: To test this hypothesis, we can use the standard statistical methods and models commonly used in other biomedical sciences. For the microbiome studies, the focus is on the hypotheses 1 and 2. The core theme of these statistical hypotheses could be the same, i.e., to explore the .highlight[impacts of environmental or external factors (e.g., interventions) on microbiome composition and/or richness of microbiota]. But the hypotheses vary. --- ## What Can We Do With Alpha Diversity? .pull-left[ <img src="data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-11-1.png" width="80%" /> ] .pull-right[ <img src="data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-12-1.png" width="80%" /> ] .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) ] ??? A hypothesis testing microbial taxa can be performed by comparing alpha and beta diversity indices. Depending on whether the data are normally or non-normally distributed, number of experimental groups, or experimental conditions, we can use a t-test, analysis of variance (ANOVA), or corresponding non-parametric test. .highlight[The statistical hypothesis could be alpha diversity.] For example, for antibiotic studies, we hypothesize that antibiotic treatment decreases microbial diversity. In the Salomon JD example, there was a significant decrease in phylogenetic diversity in the CPB post-operative samples compared to the control post-operative samples. --- ## What Can We Do With Beta Diversity? .pull-left[ ``` ## UnifracPermutation test for adonis under reduced model ## Terms added sequentially (first to last) ## Permutation: free ## Number of permutations: 999 ## ## vegan::adonis2(formula = dist_list[[i]] ~ phyloseq::sample_data(ps.beta)$group) ## Df SumOfSqs R2 F Pr(>F) ## phyloseq::sample_data(ps.beta)$group 1 0.25481 0.08254 1.9793 0.034 * ## Residual 22 2.83214 0.91746 ## Total 23 3.08695 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] .pull-right[ <br> <br> <br> <br> <img src="data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-14-1.png" width="75%" style="display: block; margin: auto 0 auto auto;" /> ] .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) ] ??? .highlight[The statistical hypothesis could also be beta diversity.] This is PERMANOVA, a multivariate analysis of variance based on .highlight[distance matrices] and .highlight[permutation]. In the Salomon JD example, we tested community similarities between CNT and CPB, i.e., comparing how dissimilar the communities are by group. There was a statistically significant difference in the β-diversity in the CPB group compared to the control group. --- class: inverse, left, middle # Characteristics of Microbiome data - High-dimensional (underdetermined) - Sparse (zero-inflated) - Over-dispersed (large within-group heterogeneities) - Compositional (naturally constrained) --- ## High-dimensional (underdetermined) .pull-left[ * High-dimensional data + p (number of features, i.e., taxa/ASV) >> n (number of samples) i.e., "wide" data table ] .pull-right[ ```r dim(otu_table(ps)) ``` ``` ## [1] 24 4578 ``` ] ![](data:image/png;base64,#./img/otu_table.png) .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) ] ??? Microbiome sequence data sets are high dimensional with tens of thousands of different categories. They are underdetermined, having the number of taxa or ASVs much greater than the number of samples. Simply put, we don't have enough sample to estimate anything. --- ## Sparse (zero-inflated) <img src="data:image/png;base64,#./img/otu_table_alter.png" width="85%" style="display: block; margin: auto;" /> ??? .highlight[In microbiome data, sparsity is seen as the absence of many taxa across samples and zeros are generated in most experiments.] Microbiome taxa abundance, especially the taxa abundance at lower taxonomic levels or ASV counts often have many zeros and heavily right skewed. --- ## Sparse (zero-inflated) .pull-left[ ```r ggplot_truehist(unlist(otu_table(ps)), "ASV table histogram") ``` ![](data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ] .pull-right[ ```r sum(otu_table(ps)==0)/(dim(otu_table(ps))[1]*dim(otu_table(ps))[2]) ``` ``` ## [1] 0.6905581 ``` ### Where do the zeros come from? * Sampling zero (count zero) + A count is used to record the number of times an event occurs. + Count zero occurs due to .highlight[non-exhaustive sampling] * Structure zero (essential zero, genuine zero, absolute zero ...) + True negatives ] ??? Count zeros present if the event did not occur on a certain situation, but may occur in another situation. This type of zero is due to a sampling problem, because components may be unobserved due to the limited size of the sample or undetectable due to the limit of techniques, such as .highlight[sequencing depth or library size]. Stucture zero means that “a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part”. Structure zero represent the absence of a taxa from a sample. --- ## Over-dispersed (large within-group heterogeneities) .pull-left[ ![](data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ] .pull-right[ ![](data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] ??? Common-Scale Variance versus Mean for Salomon example. Each point in each panel represents a different ASV's mean/variance estimate for a biological group. --- ## Over-dispersed (large within-group heterogeneities) .pull-left[ ![](data:image/png;base64,#MQ2_DataStructure_files/figure-html/unnamed-chunk-21-1.png)<!-- --> ] .pull-right[ ### Over-dispersion in sequencing data * Library sizes are widely different * Taxa count proportions (relative abundance) is larger than what would be predicted by a pre-assumed typical multinomial regression + i.e., Poisson distribution ] ??? It is known that there is over-dispersion in sequencing data. This is all high-through sequencing data, not just amplicon sequencing, RNA-Seq data as well. This is due to two reasons: * library sizes of DNA or RNA sequencing are widely different between samples * read counts are more variable than what is expected according to a Poisson distribution. --- ## Compositional (naturally constrained) .pull-left[ * Compositional data + Parts of whole and provide only relative information between their components + Convey exclusively .highlight[relative] information + The elements of the composition are .highlight[non-negative] and .highlight[sum to unity] i.e., within a sample, each relative abundance is a non-negative value between 0 and 1, which adds up to 1 ] .pull-right[ ![](data:image/png;base64,#./img/relative_abun.jpg) .footnote[ [Salomon JD, Q H, et al. Dis Model Mech (2023) 16 (5): dmm049742.](https://doi.org/10.1242/dmm.049742) ] ] ??? Compositional data quantitatively describe the parts of whole and provide only relative information between their components. Thus, compositional data exist as .highlight[the proportions or fractions of a whole, or portions of a total], conveying exclusively relative information, and have the properties: the elements of the composition are .highlight[non-negative and sum to unity]. From a practical point of view, if researchers are really only interested in .highlight[relative frequencies, not the absolute amount of data], then the data are compositional. The total sum of all component values (sometimes called the library size) is an artifact of the sampling procedure. The library size can be affected by many factors, such as technical variability or differences in experiment-specific abundance. --- class: inverse, left, middle # Challenges of Modeling Microbiome Data --- ## Statistical Challenges of Modeling Microbiome Data .pull-left[ * .section[How to incorporate the taxa/ASV phylogenegeic tree information] * .section[How to reduce dimensions and solve large p small n problem] * .section[How to handle rare taxa] * .section[How to model the microbiome data with over-dispersion and zero-inflation] ] .pull-right[ * Compositional and high-dimensional nature violate the assumptions of all standard statistical tests + "Spurious correlations" + "Constant-sum problem" * Sparsity and zero-inflation + Difficult to use parametric modeling + Difficult to use non-parametric modeling ] ??? Nonparametric methods are based on ranks, or medians; thus, generally insensitive or more “robust” to outliers and avoid making variance estimates that can be skewed by sparse samples. In the situations with .highlight[many taxa having many zeros and few available samples], it will .highlight[lack power] to perform inference on the low-abundance taxa by using the nonparametric methods. --- ### Appendix .pull-left[ * "Spurious correlations" + Independent variables `\(X\)`, `\(Y\)` and `\(Z\)` are not correlated, their ratios `\(X/Z\)` and `\(Y/Z\)` must be, because of their common divisor + .emphasize[Statistically independent components appear correlated] + .emphasize[Uncorrelated proportions are not necessarily independent] ] .pull-right[ * "Constant-sum problem" + .emphasize[If the amount of one kind of taxa in the ecosystem increases, the amounts of one or more other kinds of taxa must decrease] + However small a component/proportion is, it is .highlight[non-negative]. ] ??? Correlation analysis reply on the assumption of .highlight[Euclidean geometry in real space]. Applying correlation analysis to compositional data may yield misleading results because .highlight[the compositional data represent the special properties of the sample space], the simplex. "Spurious correlations", first observed by Pearson, when analyzing ratios of variables. With this logic, in microbiome study, .highlight[relative abundance data can make statistically independent components appear correlated, and uncorrelated proportions are not necessarily independent]. Thus, correlation of relative abundances is thought as just wrong and correlation analysis of relative abundances is considered to tell us absolutely nothing. "Constant-sum problem" has many names: negative bias difficulty, subcomposition difficulty, basis difficulty, and null correlation difficulty. For example, in each sample .highlight[if the amount of one kind of taxa in the ecosystem increases, the amounts of one or more other kinds of taxa must decrease]. And however small that component/proportion is, it is .highlight[non-negative]. Difficult to interpret the correlation and covariance between proportions in any meaningful way. --- background-image: url(data:image/png;base64,#./img/bcrf.png) background-size: 55% .footnote[ Sections of this presentation were sampled and modified from [Statistical Analysis of Microbiome Data with R](https://link.springer.com/book/10.1007/978-981-13-1534-3) by Yinglin Xia, Jun Sun and Ding-Geng Chen (2018). ]