class: center, middle, inverse, title-slide # Statistical analysis (for High Dimensional data) ### Max Qiu, PhD Bioinformatician/Computational Biologist
maxqiu@unl.edu
###
Bioinformatics Research Core Facility, Center for Biotechnology
Data Life Science Core, NCIBC
### 02-09-2022 --- background-image: url(data:image/png;base64,#https://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs13024-018-0304-2/MediaObjects/13024_2018_304_Fig1_HTML.png?as=webp) background-size: 75% # We bring in errors every step of the way .footnote[ [Shao, Y., Le, W. Mol Neurodegeneration 14, 3 (2019)](https://doi.org/10.1186/s13024-018-0304-2)</br> Copyright © 2021 BioMed Central Ltd ] ??? We bring in errors and bias every step of the way. We will discuss what to look out for in each of these steps. Let's pause for a minute and think about what could possibly go wrong in each step, what variances we could possibly bring to the data. --- # (How to reduce) Sources of variances .pull-left[ ### Study question - Formulate clear and valid `\(H_{0}\)` and `\(H_{a}\)` ### Experimental design - Confounding factors - Control, randomize, replicate, block - Sample size `\(n\)` and statistical power ### Sample Collection and Preparation - Sampling bias - Technical replicates vs biological replicates - Pseudo-replication - Robust SOP ] .pull-right[ ### Data Acquisition (instrumentation and data preprocessing) - Quality Assurance: calibration, maintenance... - Quality Control: pooled QC and injection order - Peak picking and alignment ### (Post-acquisition) Data Processing - Batch correction - Assess the quality of data: missing values - Assess feature presence/absence: missing value imputation - High-dimensional data processing + Log2 transformation, normalization, scaling + Goal: near Gaussian distribution ] ??? Study question needs to be clearly defined ahead of time, because it will influence experimental design and everything follows. In experimental design, we need to **control all sources of variation except the one** we are interested in studying. We can control and remove confounding factors by **setting up control group, random sampling and assignment, use of biological and technical replicates and blocking**. We also discussed statistical power and power analysis to decide sample number per group. We covered sample correction and preparation, including **different types of samples, and its impact on sample collection, storage and extraction**. We know **experimental procedures should be standardized and optimized** and that different groups should be treated in the same way. However there are a few things to be cautious about here. In data acquisition, we talked about instruments themselves. We also discussed quality assurance and control, the use of pooled QC, injection order layout. In data processing, we discussed batch correction (correct for signal drift), missing values, and how to assess feature presence/absence, and how to do imputation, how to normalize the data. Experimental workflow is an upstream-to-downstream process. What comes first has more impact than what comes later; Don't expect later steps to compensate what was done wrong before. **By sample correction and preparation, 50% of the result is fixed. After data acquisition, 80% of the result is done. No amount of statistic analysis, model building, machine learning can fix bad data. ** --- # Why do we care anyway? ??? If I am the one performing the whole experiment, and I am bring equal amount of variance/bias to each group, then it is still comparable because equivalent variances/bias has been brought to each groups.(Wrong, of course.) -- .left-column[ ## Research reproducibility ] -- .right-column[ .pull-left[ ] .footnote[ [Steven N. Goodman et al., Sci Transl Med 2016;8:341ps12](https://stm.sciencemag.org/content/8/341/341ps12)</br> Copyright © 2016, American Association for the Advancement of Science ] .pull-right[ ### Rubric of reproducibility: design, reporting, analysis, interpretation - Method reproducibility - Results reproducibility - Robustness and generalizability - Inferential reproducibility ] ] ??? This graph is taken from a paper called "What does research reproducibility mean?" by Goodman et al. on Science Translational Medicine, where it shows that Number of publications recorded in Scopus that have, in the title or abstract, at least one of the following expressions: research reproducibility, reproducibility of research, reproducibility of results, results reproducibility, reproducibility of study, study reproducibility, reproducible research, reproducible finding, or reproducible result. **It shows that concern about the reproducibility of scientific research has been steadily rising.** --- # Statistical analysis .pull-left[ ### Univariate - feature selection * Comparisons with multiple testing correction * Ratios and volcano plot ### Multivariate - visualization and feature extraction * Unsupervised (X only) * Supervised (X and Y) ] ??? The next step in the metabolomics pipeline is to use statistical techniques to extract the relevant and meaningful information from the processed data. **Typically the goal is to identify the metabolites that are significantly changing between classes of biological samples**. Two types of statistical analysis: **univariate and multivariate**. Can we use just a few features in your high dimensional metabolomics data that can capture the essence of this dataset? -- .pull-right[ ### Understand the statistical nature of your data * Is your data normally distributed? * Does your data have good statistical power?* * **High-dimensional** ### Consideration for high-dimensionality * Univariate analysis: **multiple testing** * Multivariate analysis: **dimension reduction** ] .footnote[ **Code demo and tutorial:** [Data processing and statistics demo](https://github.com/whats-in-the-box/tutorials_and_demos/blob/main/demo.ipynb) Copyright © University of Birmingham and Birmingham Metabolomics Training Center ] ??? At the same time, you want to keep in mind of the statistical characteristics of your data, as you proceed with statistical analysis. Last lecture we have seen this slides, except our focus was on the **distribution of the data**. This time our focus will be on the **high-dimensional nature of the data**, and its impact on statistics, specifically **multiple testing correction and dimension reduction**. We will walk through these as we talk about the statistical analysis. --- # Univariant analysis: Differential analysis .pull-left[ * **Comparisons** + **compare numeric features between groups** + **check distribution with histogram and qq plot** * Multiple testing correction + control false discovery rate (FDR) with BH correction * Ratios (Fold change) + degree of quantity change between two groups + Volcano plot: log2(abs(FC)) ~ log10(p-values) ] ??? The goal of univariate analysis is to **differentiate**; the goal is to identify the metabolite features that are significantly changing between classes of biological samples. As we discussed in last lecture, type of univariate analysis, parametric or non-parametric tests, depends on the distribution of the data. Therefore, always check your distribution with histogram and qq plot. -- .pull-right[ ## Data distribution and normality * If sample distribution is near normal, **parametric** methods can be applied + **T-test**: assumes normal distribution and equal variance; **Welch's t-test** over Student's t-test + **ANOVA**: assumes normal distribution and equal variance * If sample distribution is not normal, central limit theorem (CLT) is not satisfied, only **non-parametric** methods for hypothesis test + **Wilcoson rank sum test** + **Kruskal-Wallis test** ] ??? We should use **Welch’s t-test** by default, instead of Student’s t-test, because Welch's t-test performs better than Student's t-test whenever **sample sizes and variances are unequal between groups**, and gives the same result when sample sizes and variances are equal. --- # Univariant analysis .pull-left[ * Comparisons + compare numeric features between groups + check distribution with histogram and qq plot * **Multiple testing correction** + **control false discovery rate (FDR) with BH correction** * Ratios (Fold change) + degree of quantity change between two groups + Volcano plot: log2(abs(FC)) ~ log10(p-values) ] .pull-right[ ## Multiple testing correction * Why? **Multiple simultaneous statistical tests** increase the number of false positives in the results. * Familywise error rate (FWER) vs. False discovery rate (FDR) .footnote[ [FWER vs. FDR](https://egap.org/resource/10-things-to-know-about-multiple-comparisons/) ] ] ??? For example, if one test is performed at the 5% level and the corresponding null hypothesis is true, there is only a **5% chance of incorrectly rejecting the null hypothesis**. However, if 100 tests are each conducted at the 5% level and all corresponding null hypotheses are true, **the expected number of incorrect rejections** (also known as false positives or Type I errors) is 5. Truth is **when you run multiple simultaneous statistical tests, a fraction will always be false positives.** But there are ways we can **decrease the number of false positives.** --- # Multiple testing correction (cont.) .pull-left[ * Controlling the familywise error rate (**FWER**): **Bonferroni correction** + If a significance threshold of `\(α\)` is used (**family-wise error rate**), but `\(n\)` separate tests are performed, then the Bonferroni adjustment deems a feature significant only if the corresponding P-value is `\(≤ α/n\)`. + **Too strict.** ] .pull-right[  ] * Controlling the false discovery rate (**FDR**): **Benjamini–Hochberg procedure** + First rank the p-values in ascending order; assign ranks to the p-values; + Set the significance threshold of `\(α\)` (FDR) you are willing to accept. + Calculate each individual p-value’s Benjamini-Hochberg critical value using this formula `\((i/m)Q\)`; - i = the individual p-value’s rank - m = total number of tests - Q = the false discovery rate ( `\(α\)`, chosen by you) + Compare each original p-values against its Benjamini-Hochberg critical value; Find the largest p value that is smaller than the BH critical value. ??? FDR: the propotion of significant results that are actually false positive In many cases, Bonferroni is too strict. Bonferroni "penalize" all input p-values equally, whereas Benjamini-Hochberg (as a way to control the FDR) "punishes" p-values accordingly to their ranking. --- # Univariant analysis .pull-left[ * Comparisons + compare numeric features between groups + check distribution with histogram and qq plot * Multiple testing corrections + control false discovery rate (FDR) with BH correction * **Ratios (Fold change)** + **degree of quantity change between two groups** + **Volcano plot: log2(FC) ~ log10(p-values)** ] .pull-right[ <img src="data:image/png;base64,#./img/volcano_plot.png" width="100%" style="display: block; margin: auto;" /> ] ??? A volcano plot is a type of scatterplot that shows **statistical significance (P value)** versus **magnitude of change (fold change)**. It enables quick visual identification of genes with large fold changes that are also statistically significant. In a volcano plot, the most upregulated genes are towards the right, the most downregulated genes are towards the left, and the most statistically significant genes are towards the top. --- # Multivariate analysis and dimension reduction * Purpose of multivariate analysis (through dimensionality reduction) + **Visualization** and **feature selection/extraction** * Motivating question + How do we visualize high-dimensional data? + Can we find a **small number of features** that accurately capture the **relevant properties** of the data? * Gist: Project the data from original high-dimensional space into a "smaller" low-dimensional subspace + **Goal: to discover the dimensions that matter the most** * Two main methods for reducing dimensionality + Feature extraction: PCA (unsupervised method): finding a **new** set of `\(k\)` dimensions that are **combinations of the original** `\(d\)` dimensions + Feature selection: PLS-DA (supervised method): finding `\(k\)` of the `\(d\)` dimensions that give us the most information and we discard the other `\((d-k)\)` dimensions ??? There are many types of multivariate analysis, unless you are a statistician, you'll unlikely to know or use all of them. I want to point to you some of the analysis that we use frequently to analyze mass spec generated metabolomics data. **In multivariate analysis, we are looking at multiple variable simultaneously, assuming that information resides in the joint distribution.** The purpose of multivariate analysis are visualization and feature selection/extraction. By dimension reduction, or subspace estimation (in machine learning term). The motivating question we are asking here is "". (For example, we can probably describe the trajectory of a tennis ball by its velocity, diameter, and mass.) Similarly, can we find just a few features in your high dimensional metabolomics data that can capture the essence of this dataset? How? (gist) The two main methods to use are PCA and PLS-DA, that can help us achieve our goal, which is **to discover the dimensions that matter the most, or to help us see the dominant trend in the data**. In feature extraction, we are interested in finding a **new** set of `\(k\)` dimensions that are **combinations of the original** `\(d\)` dimensions. In feature selection, we are interested in finding `\(k\)` of the `\(d\)` dimensions that give us the most information and we discard the other `\((d-k)\)` dimensions. --- # Multivariate analysis: supervised or unsupervised? * Unsupervised methods (X only, no Y) + **Exploration & visualization - trends, quality, outliers** + PCA (principle component analysis ), clustering (K-means), etc. * Supervised methods (X and Y) + **Classification (qualitative) & Regression (quantitative) - prediction and inference** + PLS-DA (participial least square - discriminant analysis), decision trees (random forest), Support Vector Machine, etc. + Overfitting ([bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)) ??? In this context, the "unsupervised" method suggests that the analysis does not take data point's label (group, phenotype, treatment, etc.) into consideration, it classify data based on only the feature space, the peak intensities. Such as PCA and clustering techniques (K-Mean clustering). The "supervised" method does take the label info into consideration during the classification process, such PLS-DA, decision trees (random forest) and support vector machine. Because they take label into consideration, they run with the risk of overfitting, which is when you model/method fit your training data (the data you use to train your model) perfectly, but generalize (fits new unseen data) badly. **Overfitting** occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose. **Generalization of a model to new data is ultimately what we want** in a statistical method or model to make predictions and classify data. --- # Principle component analysis (PCA) .pull-left[  ] .pull-right[ * Goal: We want to find a feature that can explain most of the variance of the data. * Problem: `\(x1\)` and `\(x2\)` are correlated, i.e., non-zero covariance. **Both features contribute to the variance of the data.** * This is **Feature extraction**. Instead of choosing between `\(x1\)` or `\(x2\)` (existing features), we create a **new** feature that can explain most of the variance. * Solution: De-correlation. ] ??? This graph explains the main idea of PCA. Let's consider a 2D data distribution being plotted here. We want to reduce the dimensionality of this dataset, i.e., we want to find a feature (1D) that can explain most of the variance of this data. Problem is we cannot simply decide which feature to keep, which feature to eliminate. Why? Because these two features are **correlated**. We cannot determine if one feature contributes more to the variance than the other one. This is a feature extraction problem, not feature selection. Instead of choosing between two existing features (we cannot, because they are correlated), we create a new feature that can explain the most of the variance. How to find this most discriminating new feature? By **de-correlate** the features such that their covariance vanishes. By rotating the axis, changing the bases. --- # Principle component analysis (PCA) .pull-left[ * De-correlation (How?) + Eigen-decomposition + Singular value decomposition * PCs are linear combination of the original variables + Number of original features = Number of PCs * Sequentially explains largest variation possible - scree plot * PCs are uncorrelated (orthogonal) ] .pull-right[ <img src="data:image/png;base64,#./img/pca_3.png" width="100%" style="display: block; margin: auto;" /> ] ??? PCA accomplish de-correlation by **changing the bases or axis such that they are not correlated**. In the new bases, the horizontal axis accounts for most of the variance of the data. Thus, we have created (extracted) a new feature (along the horizontal axis) that contributes most to the variance. How does PCA find the de-correlated features `\(z1\)` and `\(z2\)`? That is the matrix algebra behind PCA, which we will not get into. There are generally two ways, **eigen-decomposition and singular value decomposition**. A few things about PCs: ... --- # Principle component analysis (PCA) .pull-left[ * **Score plot** + **Projected observation distribution on new plane/base** + Similar observations accumulate within the same relative space (dispersion = dissimilar) * Loading plot (biplot) + Explains how original variables are linearly combined to form new PCs + Variables with largest absolute loadings have greatest importance + Direction in score plots corresponds to direction in loading plot - biplot * Scree plot + How many PCs should be keep? + Plot the variance explained as a function of number of PCs kept + At the "elbow", adding new PC does not significantly increase the variance explained by PCA ] .pull-right[ <!-- --> .footnote[ [Nguyen LH, Holmes S. PLoS Comput. Biol. 15(6): e1006907 (2019)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907) ] ] ??? Diagnostic plots of PCA (output). Example datasets: wine data. The variables include the chemical properties and composition of the wines. Class labels for grape varieties (59 Barolo, 71 Grignolino, 48 Barbera). --- # Principle component analysis (PCA) .pull-left[ * Score plot + Projected observation distribution on new plane/base + Similar observations accumulate within the same relative space (dispersion = dissimilar) * **Loading plot (biplot)** + **Explains how original variables are linearly combined to form new PCs** + **Variables with largest absolute loadings have greatest importance** + Direction in score plots corresponds to direction in loading plot - biplot * Scree plot + How many PCs should be keep? + Plot the variance explained as a function of number of PCs kept + At the "elbow", adding new PC does not significantly increase the variance explained by PCA ] .pull-right[ <!-- --> .footnote[ [Nguyen LH, Holmes S. PLoS Comput. Biol. 15(6): e1006907 (2019)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907) ] ] ??? Loading plot explains exactly how the old variables (dimensions) contributes to the new variables (PCs). Some original variables contributes a lot to a PC, while others contributes less. --- # Principle component analysis (PCA) .pull-left[ * Score plot + Projected observation distribution on new plane/base + Similar observations accumulate within the same relative space (dispersion = dissimilar) * **Loading plot (biplot)** + **Explains how original variables are linearly combined to form new PCs** + **Variables with largest absolute loadings have greatest importance** + Direction in score plots corresponds to direction in loading plot - biplot * Scree plot + How many PCs should be keep? + Plot the variance explained as a function of number of PCs kept + At the "elbow", adding new PC does not significantly increase the variance explained by PCA ] .pull-right[ <!-- --> .footnote[ [Nguyen LH, Holmes S. PLoS Comput. Biol. 15(6): e1006907 (2019)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907) ] ] ??? This is another version of the loading plot, not a bar graph, but it still shows how important each original variable to the new PC. Variables with largest absolute loadings (imagine a shadow) have the greatest importance to the new PC. --- # Principle component analysis (PCA) .pull-left[ * Score plot + Projected observation distribution on new plane/base + Similar observations accumulate within the same relative space (dispersion = dissimilar) * **Loading plot (biplot)** + Explains how original variables are linearly combined to form new PCs + Variables with largest absolute loadings have greatest importance + **Direction in score plots corresponds to direction in loading plot** - biplot * Scree plot + How many PCs should be keep? + Plot the variance explained as a function of number of PCs kept + At the "elbow", adding new PC does not significantly increase the variance explained by PCA ] .pull-right[ <!-- --> .footnote[ [Nguyen LH, Holmes S. PLoS Comput. Biol. 15(6): e1006907 (2019)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907) ] ] ??? Loading plot overlay with score plot gives biplot. --- # Principle component analysis (PCA) .pull-left[ * Score plot + Projected observation distribution on new plane/base + Similar observations accumulate within the same relative space (dispersion = dissimilar) * Loading plot (biplot) + Explains how original variables are linearly combined to form new PCs + Variables with largest absolute loadings have greatest importance + Direction in score plots corresponds to direction in loading plot - biplot * **Scree plot** + How many PCs should be keep? + **Plot the variance explained as a function of number of PCs kept** + At the "**elbow**", adding new PC does not significantly increase the variance explained by PCA ] .pull-right[ <!-- --> .footnote[ [Nguyen LH, Holmes S. PLoS Comput. Biol. 15(6): e1006907 (2019)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907) ] ] ??? There is an elbow point in scree plot, after which the variance explained in following PCs flattens. Adding more PC after the elbow point doesn't increase the total variance explained by the PCA significantly. --- # Power analysis **Power analyses exploit an equation with four variables ( `\(α\)`, power, `\(N\)`, and effect size)**. The statistical power depends on three main factors: .pull-left[ * The magnitude of the effect of interest in the population (**effect size**) + Effect size (es) is usually defined as the difference of two group means divided by the pooled standard deviation. + When all others are equal, a larger the effect size will lead to more power. * The **sample size** used to detect the effect + More samples will in general increase power. * The **statistical significance criterion** used in the test + In a single test, this is p-value. + For high-dimensional metabolomics data, this is FDR. + When all others are equal, there will be reduced power if we require a very high degree of confidence. ] .pull-right[ <img src="data:image/png;base64,#./img/power.png" width="100%" style="display: block; margin: auto;" /> .footnote[ Available on [MetaboAnalyst](https://www.metaboanalyst.ca/docs/Faqs.xhtml) ] ] ??? **Power analysis often refers to calculating the sample size required to detect an effect of a given size with a given degree of confidence.** The first factor (effect size) is **estimated from the data**. The second factor (sample size) is the **interest of the power analysis** (i.e. we want to investigate the statistic power given the current sample size and effect size). The third factor, which needs user to specify, is a significance criterion or **alpha level**. For a single biomarker, this is usually a p-value. For high-dimensional metabolomic data, the common choice is false discovery rate or FDR. In this case, FDR is set at 0.1. --- # Power analysis .pull-left[ * Step 1, parameter selection. Select the two groups (in one pairwise comparison) on which to perform power analysis. * Step 2, analyze the diagnostic plots which display the distribution of the test statistics and p-values. * Step 3, power analysis. **The ultimate aim of power analysis is to determine the minimum sample size used to detect an effect size of interest**. ] .pull-right[ <img src="data:image/png;base64,#./img/power_analysis_1.png" width="100%" style="display: block; margin: auto;" /> ] .footnote[ Implement through R package [SSPA](https://www.bioconductor.org/packages//2.10/bioc/html/SSPA.html) ] ??? There will be four diagnostic plots which display a visual overview of the test-statistics and p-values, providing context for whether or not the normalization was sufficient. **The shape of the test-statistic should follow a near-normal distribution, and the majority of p-values should be close to zero.** * The test statistics (t-statistics) are expected to follow a near-normal distribution. * Majority of features should be significantly different between the two selected conditions. You should see some p values (hopefully majority) close to zero (distribution of p-value should be right skewed). --- # Power analysis .pull-left[ * Step 1, parameter selection. Select the two groups (in one pairwise comparison) on which to perform power analysis. * Step 2, analyze the diagnostic plots which display the distribution of the test statistics and p-values. * Step 3, power analysis. **The ultimate aim of power analysis is to determine the minimum sample size used to detect an effect size of interest**. ] .pull-right[ <img src="data:image/png;base64,#./img/power_analysis_2.png" width="100%" style="display: block; margin: auto;" /> ] .footnote[ Implement through R package [SSPA](https://www.bioconductor.org/packages//2.10/bioc/html/SSPA.html) ] ??? Based on power analysis, sample size of 10-11 give reasonable statistical power to detect differences in two pairs of comparisons. Group 1 vs 2 has 30.3% power to detect differences after multiple testing correction. --- # Power analysis .pull-left[ <img src="data:image/png;base64,#./img/power_analysis_3.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#./img/power_analysis_4.png" width="100%" style="display: block; margin: auto;" /> ] **Code demo and tutorial:** [Power analysis using `SSPA`](https://github.com/whats-in-the-box/tutorials_and_demos/blob/main/PowerAnalysis.ipynb) --- # Power analysis .pull-left[ ### Type of power * Predicted (*priori*) power + Power calculated **before data correction**, used for deciding sample number per group in order to observe a particular effect size. + Relationship between power and `\(N\)` after **stipulating `\(α\)` and (population or estimated) effect size**. * Observed (*post-hoc*) power + Power calculated **after the fact, with sample size and effect size constrains**. + Solve for power by stipulating `\(α\)`, `\(N\)`, and (sample) effect size. ] .pull-right[ ### Pro or against post-hoc power depending on your motivation * Against: What chance was there of producing a statistically significant result, assuming that the population effect is **exactly equal to the observed sample effect size**? + Calculating post-hoc power of the **test you have performed** (usually nonsignificant result), **based on the effect size estimate from your data**. + ['“The claim that a study is ‘underpowered’ with respect to an observed nonsignificant result” is “tautological and uninformative”.'](https://www.tandfonline.com/doi/abs/10.1080/19312450701641375) ] ??? WRONG: People often use post-hoc power analysis to determine the power they need in order to **detect the effect observed in their study** after **finding a non-significant result**, and **use the low power to justify why their result was non-significant and that their theory might still be right**. First, **given a nonsignificant result, one already knows that the observed statistical power is low** (the power for detecting a population effect equal to the obtained sample effect). As Hoenig and Heisey (2001) point out, “because of the **one-to-one relationship between p values and observed power**, **nonsignificant p values always correspond to low observed powers**”. Thus, “the claim that a study is ‘underpowered’ with respect to an observed nonsignificant result” is “tautological and uninformative”. The argument would go something like this "I didn't get a statistically significant result, but then for an effect size of x my power was only 50% so this doesn't really tell me very much." This is a **circular logic**. Second, observed power differs from the true power of your test, because the true power depends on the true effect size you are examining, which is unknown. It is tempting to treat post-hoc power as if it is similar to the true power of your study, but it is a **USELESS** statistical concept. --- # Power analysis .pull-left[ ### Type of power * Predicted (*priori*) power + Power calculated **before data correction**, used for deciding sample number per group in order to observe a particular effect size. + Relationship between power and `\(N\)` after **stipulating `\(α\)` and (population or estimated) effect size**. * Observed (*post-hoc*) power + Power calculated **after the fact, with sample size and effect size constrains**. + Solve for power by stipulating `\(α\)`, `\(N\)`, and (sample) effect size. ] .pull-right[ ### Pro or against post-hoc power depending on your motivation * Pro: What chance was there of producing a statistically significant result, based on **population effect sizes** of independent interest? + ["Where after-the-fact power analyses are based on population effect sizes of independent interest (as opposed to a population effect size exactly equal to whatever happened to be found in the sample at hand), they can potentially be useful."](https://www.tandfonline.com/doi/abs/10.1080/19312450701641375) + Can be useful supplement to p-values and confidence intervals, but **only when based on population effect magnitudes of independent interest**. Confidence intervals are almost always more informative. ] ??? RIGHT: "where after-the-fact power analyses are based on population effect sizes of independent interest (as opposed to a population effect size exactly equal to whatever happened to be found in the sample at hand), they can potentially be useful." If a researcher knew they would only be able to get a certain number of patients with a rare disease and wanted to know the power they would be able to achieve to detect a given clinically significant effect. “Previous researchers found effects averaging about r=.40, and we had good power (a good chance of finding statistically significant results) assuming a population effect of .40, so the fact that we didn’t find significant effects is meaningful...” --- # More reading about observed power * [The Abuse of Power](https://www.tandfonline.com/doi/abs/10.1198/000313001300339897) + “Because of the **one-to-one relationship between p values and observed power**, **nonsignificant p values always correspond to low observed powers**.” * [Brief Report: Post Hoc Power, Observed Power, A Priori Power, Retrospective Power, Prospective Power, Achieved Power: Sorting Out Appropriate Uses of Statistical Power Analyses](https://www.tandfonline.com/doi/abs/10.1080/19312450701641375) * [Calculating Observed Power Is Just Transforming Noise](https://lesslikely.com/statistics/observed-power-magic/) * [Observed power, and what to do if your editor asks for post-hoc power analyses](http://daniellakens.blogspot.com/2014/12/observed-power-and-what-to-do-if-your.html) * [With the Ability to Calculate Power Comes Great Responsibility](https://medium.com/geekculture/with-the-ability-to-calculate-power-comes-great-responsibility-8f2792e59e0c) ??? First paper demonstrated the one-to-one relationship between p-values and observed power given a nonsignificant result. --- class: inverse, center, middle # Next: Functional Analysis Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). .footnote[ [OpenIntro Statistics](https://www.openintro.org/book/os/) ]