class: center, middle, inverse, title-slide # Introduction to Metabolomics ### Max Qiu, PhD Bioinformatician/Computational Biologist
maxqiu@unl.edu
###
Bioinformatics Research Core Facility, Center for Biotechnology
Data Life Science Core, NCIBC
### 01-31-2022 ---  Objectives: "By analyzing and comparing the metabolomes of patients at different stages of chronic hepatitis B and comparing them to healthy individuals, we want to determine the metabolic signature of disease progression and develop a more **accurate metabolome-based method for diagnosis of disease progression** ultimately giving a better basis for **treatment decisions**." ---  .pull-left[ Objectives: "We compared the inducible metabolomes of heat-stressed (abiotic) and *C. heterostrophus*-injected (biotic) maize and examined the effects of heat stress on the ability of maize to defend itself against *C. heterostrophus*." ] .pull-right[ Results: treatment with heat stress prior to fungal inoculation negatively impacted maize disease resistance against *C. heterostrophus*, and distinct metabolome separation between combinatorial stressed plants and the non-heat-stressed infected controls. ] ---  .pull-left[ Objective: investigate the importance of shrimp size in terms of metabolite profile and sensory properties. Pricing based on scientific findings ] .pull-right[ Results: small shrimp had higher accumulations of amino acid, sugars, and organic acids Larger shrimp were sweeter, juicier, and crisper ] --- # Metabolomics [Nature](https://www.nature.com/subjects/metabolomics): Metabolomics refers to the systematic identification and quantification of the **small molecule metabolic products** (the **metabolome**) of a biological system (Cell, tissue, organ, biological fluid, or organism) at a **specific point in time**. [EMBL-EBI](https://www.ebi.ac.uk/training/online/courses/metabolomics-introduction/what-is/): Metabolomics is the **large-scale study** of **small molecules**, commonly known as metabolites, within cells, biofluids, tissues or organisms. Collectively, these small molecules and their interactions within a biological system are known as the **metabolome**. [ScienceDirect](https://www.sciencedirect.com/topics/medicine-and-dentistry/metabolomics): Metabolomics is defined as the systematic study of all chemical processes concerning metabolites, providing characteristic chemical fingerprints that specific cellular processes yield, by means of the study of their **small-molecule metabolite** profiles. [Science](https://www.science.org/content/article/big-data-big-picture-metabolomics-meets-systems-biology): Metabolomics—the study of the **collection of an organism's metabolites**—provides a molecular measurement of **phenotype**, or the characteristics resulting from the **genotype's interaction with the environment**. ??? EBI: European bioinformatics institute. --- # Metabolomics and Metabolome .pull-left[ ### Metabolomics * Attempts to measure all of the (small molecules) metabolites (metabolome) * High-throughput (large-scale) * Reflect phenotype: genotype interaction with the environment * Snapshot (a specific time) ] ??? Metabolomics is a non-biased experimental approach that attempts to measure all of the metabolites in a biological sample. It is a **high-throughput** approach to **characterize and quantify the metabolome** present in a system or physiological state. And it reflects the characteristics of **phenotype, which is genotype's interaction with the environment**. -- .pull-right[ ### Metabolic profiling (basically same thing) * No one analytical methods can identify all the metabolites. + Combination of analytical approaches to maximize the number of metabolites and increase coverage. * Detection of a wide range of metabolites ] ??? Although the approach is designed to analyze all metabolites, **no single analytical technique or even combination of analytical techniques can detect all of the metabolites present in a complex sample**, therefore some groups defined the approach as **metabolic profiling**, which is simply to shoot for the detection of a wide range of metabolites. (More further) --- # Metabolomics and Metabolome .pull-left[ ### Metabolomics * Attempts to measure all of the (small molecules) metabolites (metabolome) * High-throughput (large-scale) * Reflect phenotype: genotype interaction with the environment * Snapshot ### Metabolites * Low molecular weight biochemical (< 1.5 kDa), including + carbohydrates + amino acids + organic acids + nucleotides + lipids ] .pull-right[ ### Metabolic profiling (basically same thing) * No one analytical methods can identify all the metabolites. + Combination of analytical approaches to maximize the number of metabolites and increase coverage. * Detection of a wide range of metabolites ] ??? Metabolites are low molecular weight biochemicals. Generally speaking is small molecules that are less than 1.5 kDa. And that includes carbohydrates, amino acids, organic acids, nucleotides, and lipids. (Some of the example molecules are shown here.) Metabolites **inhabit a diverse chemical space**. --- background-image: url("data:image/png;base64,#./img/metabolites.PNG") background-size: 75% ??? (Some of the example molecules are shown here.) They are the **intermediates and products** of metabolism. --- background-image: url("data:image/png;base64,#./img/metabolites_involved.PNG") background-size: 75% ??? They are the **building blocks** for larger biochemicals including DNA, RNA and proteins. They are **structural components of cells** (cell wall) and in the **regulation of other biochemical processes**, including the regulation of enzyme activity of proteins through **allosteric and post-translational modifications**. --- # Metabolomics and Metabolome .pull-left[ ### Metabolomics * Attempts to measure all of the (small molecules) metabolites (metabolome) * High-throughput (large-scale) * Reflect phenotype: genotype interaction with the environment * Snapshot ### Metabolites * Low molecular weight biochemical (< 1.5 kDa), including + carbohydrates + amino acids + organic acids + nucleotides + lipids ] .pull-right[ ### Metabolic profiling (basically same thing) * No one analytical methods can identify all the metabolites. + Combination of analytical approaches to maximize the number of metabolites and increase coverage. * Detection of a wide range of metabolites ### Metabolome * Entire qualitative collection of metabolites in a biological sample ] ??? The entire qualitative collection of metabolites in a biological sample is called the metabolome. The human body contains many differnt types of metabolomes, **representing different biofluids and tissues**, and each of these metabolomes is unique in which metabolites are present and the concentrations of each metabolite. --- # Metabolism * **Integration of the chemical and physical processes in which metabolites are broken down or synthesized.** * **Catabolism vs Anabolism** + Catabolism: the breakdown of organic substrate to provide chemical energy and metabolic intermediates + Anabolism: synthesis of cellular components from metabolic precursors and requires energy * Metabolic pathway and network .footnote[ University of Birmingham </br> and Birmingham Metabolomics Training Center ] ??? Metabolism is the integration of the chemical and physical processes in which metabolites are broken down or synthesized. **A single metabolic reaction converts one metabolite to another metabolite via an enzymatic reaction.** It is a **major source of cellular information** that **integrates intracellular and environmental signals** to jointly coordinate processes such as nutrient utilization, hormone signaling, or cell differentiation. -- <img src="data:image/png;base64,#./img/Anabolism-and-Catabolism.png" width="50%" style="display: block; margin: auto;" /> ??? Metabolism can be separated into catabolic and anabolic processes. * Catabolism involves the breakdown of organic substrate (typically applying **oxidation processes**) to provide chemical energy in the form of ATP. ATP is like a rechargeable battery that is continuously broken down and recharged by removal and addition of phosphate chemical groups. The removal of a phosphate group releases energy that fuels our cells and provides heat in our bodies. Catabolic reactions also produces **metabolic intermediates** that may be used in subsequent anabolic reactions. * Anabolism results in the synthesis of **cellular components from metabolic precursors** and requires energy, compared to catabolism which produces energy. --- # Metabolism .pull-left[ * Integration of the chemical and physical processes in which metabolites are broken down or synthesized. * Catabolism vs Anabolism + Catabolism: the breakdown of organic substrate to provide chemical energy and metabolic intermediates + Anabolism: synthesis of cellular components from metabolic precursors and requires energy * **Metabolic pathway and network** .footnote[ [T. Lengauer, C. Hartmann, 3.15 - Bioinformatics, </br> Comprehensive Medicinal Chemistry II, Volume 3, 2007, Pages 315-347, (2021)](https://www.sciencedirect.com/science/article/pii/B008045044X000882)</br> Copyright © 2007 Elsevier Ltd. ] ] .pull-right[ <img src="data:image/png;base64,#./img/metabolic_network.jpg" width="80%" style="display: block; margin: auto 0 auto auto;" /> ] ??? **Groups of metabolic reactions** can be integrated into a metabolic pathway, and **complete set of metabolic reactions** can be visualized as a metabolic network. The networks are similar to a subway map, where stations are metabolites and train lines are the metabolic reactions. --- # Metabolomics, younger addition to omics science <img src="data:image/png;base64,#./img/omics.png" width="40%" style="display: block; margin: auto;" /> ??? Metabolomics is regarded as the younger sibling of the omics sciences, after genomics, transcriptomics and proteomics. But the study of metabolites and metabolism has actually been performed for more than 100 years. The metabolomics field is **strongly founded within the "realm of biochemistry",** that is the **study of chemical process within living organisms**. Metabolic profiling first appeared in the literature in the 1950s and developed very slowly throughout the next three decades. But it has only become an area of major research interest since about 2003-2005 (that's another 20 years passed). The number of publications in the last two years (2003 to the start of September 2005) is more than double the total number of publications on this topic than in the proceeding 20~ years combined. --- # Metabolomics, younger addition to omics science <img src="data:image/png;base64,#./img/development_of_metabolomics.jpg" width="75%" style="display: block; margin: auto;" /> .footnote[ Copyright © University of Birmingham </br> and Birmingham Metabolomics Training Center ] ??? This graph showed the number of publications on genomics, proteomics and metabolomics on PubMed between 1992 to 2019. If publication is not enough of convincing, the metabolomics market value is expected to reach $2.38 Billion by 2021. --- # Why study the metabolome? ### The advantages of metabolomics .pull-left[ * **Metabolomics is transferable between different biological systems.** ] .pull-right[ <img src="data:image/png;base64,#./img/transferability_1.png" width="70%" style="display: block; margin: auto;" /> ] ??? One advantage of studying the metabolome is **the transferability of the analytical approach across different biological systems**. A metabolite, unlike a gene, transcript or a protein, is the same in every organism. For example, glucose is the **same metabolite** in humans as it is in worms, plants and sea anemones. --- # Why study the metabolome? ### The advantages of metabolomics .pull-left[ * **Metabolomics is transferable between different biological systems.** <img src="data:image/png;base64,#./img/transferability_2.png" width="100%" style="display: block; margin: auto 0 auto auto;" /> ] .pull-right[ <img src="data:image/png;base64,#./img/transferability_1.png" width="70%" style="display: block; margin: auto;" /> ] ??? Genes, transcripts and proteins can be **modified to alter their function**, for example, the methylation of a gene is to switch it on or off, **while the modification of a metabolite results in synthesis of a different metabolite**. Therefore, assuming your **analytical method (instrument) can measure a specific metabolite**, and appropriate sample preparation is performed, you can detect that metabolite regardless of the sample type. --- # Why study the metabolome? ### The advantages of metabolomics .pull-left[ * Metabolomics is transferable between different biological systems. * **Metabolism is highly conserved across biology.** ] .pull-right[ <img src="data:image/png;base64,#./img/highly_conserved.png" width="100%" style="display: block; margin: auto;" /> ] ??? Metabolism is **highly conserved** across biology. The **core reactions that are central to life are largely the same** across the microbial, plant, and animal kingdoms. Amino acid, energy, carbohydrate and lipid metabolism have evolved to provide the basic functions of life. The **enzymes** catalyzing these reactions are highly conserved, and their **substrates and products** of these enzymes are common between biological species. Therefore, the knowledge we acquire for **model organisms** in the lab is applicable to metabolic processes in humans. --- # Why study the metabolome? ### The advantages of metabolomics .pull-left[ * Metabolomics is transferable between different biological systems. * Metabolism is highly conserved across biology. * **Metabolome provides the closest link to the phenotype of an organism.** + Fast turnover (snapshot) + Easier to detect changes (amplified compared to other upstream omics) ] .pull-right[ <img src="data:image/png;base64,#./img/integrated_omics.png" width="75%" style="display: block; margin: auto;" /> ] ??? A third advantage of studying the metabolome is that it provides the **closest link to the phenotype** of an organism. Like we discussed, metabolomics reflects the characteristics of phenotype, which is the interaction of genotype and environment. Similarly, metabolome is the **downstream product of the interaction between the genome and the environment**. This means two things: * The turnover of metabolites is so **rapid** that measuring the metabolome provides a **dynamic and sensitive indicator** of phenotypic changes in the organism. * **Metabolic changes are amplified** compared to the genome and proteome. Small changes in the enzyme activity can leads to large changes on the metabolic level. This **amplification** means that **subtle changes are easier to measure in the metabolome**, therefore metabolomics is an ideal tool for detecting these changes or perturbations to the biological system. --- # Why study the metabolome? ### The advantages of metabolomics .pull-left[ * Metabolomics is transferable between different biological systems. * Metabolism is highly conserved across biology. * Metabolome provides the closest link to the phenotype of an organism. + Fast turnover (snapshot) + Easier to detect changes (amplified compared to other upstream omics) * **Metabolomics is amenable to high-throughput technologies, which keeps cost per sample low.** ] .pull-right[ <img src="data:image/png;base64,#./img/high_throughput.jpg" width="100%" style="display: block; margin: auto;" /> ] ??? A fourth advantage is the high sample throughput and the associated **relatively low costs** that can be achieved in metabolomics. For example, an analysis time of 15 min allows 144 samples to be analyzed per day and more than a thousand samples to be analyzed a week. This means that **large scale studies** can be performed, and this technique is highly appropriate for **screening large numbers of samples**. --- # Why study the metabolome? ### The advantages of metabolomics .pull-left[ * Metabolomics is transferable between different biological systems. * Metabolism is highly conserved across biology. * Metabolome provides the closest link to the phenotype of an organism. + Fast turnover (snapshot) + Easier to detect changes (amplified compared to other upstream omics) * Metabolomics is amenable to high-throughput technologies, which keeps cost per sample low. ] .pull-right[ #### Other considertations * Often end products of biochemical processes * Sensitive to endogenous and exogenous stimuli * Can reveal transient changes closely aligned with the disease state of a system * Real time snapshot of the system * **Number of metabolites (~40,000?): order of magnitude lower than the number of genes/transcripts.** ] ??? There are also some other thoughts that counts as advantages of metabolomics, some overlaps with what we already discussed. But I want to point to this last one here. As of a couple years ago, NCBI metabolite database records about 40,000 metabolites, which is order of magnitude lower than the number of genes/transcripts we know, which can only be an advantage from data analysis point of view. (**The curse of high-dimensionality**) --- # Future of Metabolomics: Integrated Omics Integrated Omics - Integrate two or more Omics datasets .pull-left[ * The study of biological interactions between components in a system can be investigated at a single functional level or in different functional levels * Study the components and their interactions in a **holistic systematic** approach rather than a reductionist approach * Objective - From *individual omics dataset* To *biologically meaningful context* * Goal - **System Biology** ] .pull-right[ <img src="data:image/png;base64,#./img/integrated_omics.png" width="75%" style="display: block; margin: auto;" /> ] ??? After 20 years of development, metabolomics has become a mature addition to the omics sciences. The future of metabolomics, at least one direction, is integrated omics, the integration of metabolomics with other omics data. Integrated Omics: Multi-Omics, or Cross-Omics **Metabolomics only investigates from one functional level**, so are genomics, transcriptomics, and proteomics. If we can **combine all different functional levels, we can investigate from a "holistic and systematic" approach**. It's like looking at an elephant from all the angles instead of just one. Ultimately, our goal is system biology. --- background-image: url('./img/challenges_integrated_omics.jpg') background-size: 65% .footnote[ [Biswapriya B Misra et al. </br> J Mol Endocrinol. </br> 2018 Jul 13:JME-18-0055.](https://jme.bioscientifica.com/view/journals/jme/62/1/JME-18-0055.xml)</br> Copyright: © 2018 Society </br> for Endocrinology 2018 ] ??? Data scaling & reduction – **order of magnitude difference** in dataset dimensions between genomics and metabolomics. Variances among samples across omics – **large and sparse**, rendering cluster analysis uninformative. **The major bottleneck in omics is becoming less about our ability to generate data and more about our ability to process, analyze, interpret and store the data. ** Welcome to the age of BIG DATA. --- background-image: url(data:image/png;base64,#https://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs13024-018-0304-2/MediaObjects/13024_2018_304_Fig1_HTML.png?as=webp) background-size: 75% # Metabolomics Workflow .footnote[ [Shao, Y., Le, W. Mol Neurodegeneration 14, 3 (2019)](https://doi.org/10.1186/s13024-018-0304-2)</br> Copyright © 2021 BioMed Central Ltd ] ??? This is a general workflow for a metabolomics study and we will follow this workflow and discuss every step in the next two weeks of this class. In all honesty, if you are sitting here, you should already know the general steps a scientific process (I would be worried if you don't). And this workflow just expands a general scientific process with more detailed steps in the context of a metabolomics study. Hopefully at the end of this two weeks you'll have a clear impression as to how to design and conduct a whole metabolomics study, and what to pay attention to in each step. Today we will talk about the first step, formulate a valid study question. It sounds trivial, and I am not questioning if you know what your experiments are about. What I want to point out here is the type of your study question, is it a hypothesis-generating phase question or a validation phase question? Because that changes everything follows, starting from experiment design. --- # Study question #### Study question will influence the experiment design. * Define the study question clearly ahead of time * **WILL** influence analytical approaches. .pull-left[ **Hypothesis-generating studies** (Discovery) * **Untargeted**: maximize the number of metabolites detected + Combine multiple analytical methods to increase coverage * Identify the statistically significant metabolites and constructed the hypothesis + Metabolite identification ] .pull-right[ <img src="data:image/png;base64,#./img/hypothesis_generating.png" width="100%" style="display: block; margin: auto;" /> ] ??? Study question will influence the experimental design, therefore define the study question clearly ahead of time. Hypothesis-generating study, in the context of metabolomics, measures a **wide diversity of metabolites**, hundreds to thousands of them, so we can investigate **global changes** in the metabolome. **The objective is usually to determine one or a group of metabolites that changes as a result of a perturbation** to a biological system. (This type of studies help us identify targets for drug development, or discover new biomarkers or pathways, for example.) The way to achieve this is through so-called "**untargeted** metabolomics". Through untargeted approach, we can **simultaneously measure hundreds to thousands metabolites by combining multiple analytical methods to increase metabolite coverage** and analyze **thousands of different samples in a short amount of time**, and then use a combination of **univariate and multivariate statistical analysis** to interrogate that data and to determine the significant metabolic changes between different experimental conditions. Ultimately we can integrate data to acquire new biological inferences. --- # Study question #### Study question will influence the experiment design. * Define the study question clearly ahead of time * **WILL** influence analytical approaches. .pull-left[ **Hypothesis-testing studies** (Validation) * Based on **biological context** + Biologically perturb the system: knock-out or enhance a metabolic reaction + Different functional level: post-translational modification; enzyme activity ] .pull-right[ <img src="data:image/png;base64,#./img/hypothesis_testing.png" width="100%" style="display: block; margin: auto;" /> ] ??? A hypothesis-generating study will often be the first phase (stage 1 of a study), which is then followed by **one or more hypothesis-testing studies to validate the discoveries** from the first stage. **How we design validation study is hugely dependent on the biological context**. For example, what do we do if we **have identified a metabolic reaction** that is important in a biological mechanism. Well we can test this hypothesis biologically **by perturbing the system**. The perturbation could be to **knock out or enhance the metabolic reaction** and measure changes in the phenotype. We could also measure changes at **different functional levels**, for example, investigate **whether a change in a metabolic reaction is a consequence of changes at the proteomic level**. --- # Study question #### Study question will influence the experiment design. * Define the study question clearly ahead of time * **WILL** influence analytical approaches. .pull-left[ **Hypothesis-testing studies** (Validation) * Based on biological context + Biologically perturb the system: knock-out or enhance a metabolic reaction + Different functional level: post-translational modification; enzyme activity * **Targeted**: precise and accurate in the appropriate biological matrix ] .pull-right[ <img src="data:image/png;base64,#./img/hypothesis_testing.png" width="100%" style="display: block; margin: auto;" /> ] ??? In a validation study, the objective is no longer to measure as much metabolites as we can. In validation phase, the metabolites of interest **are known**, we can apply a **targeted** analytical method to detect **only the metabolites of interest**, and the **analytical method should be precise and accurate** in identifying the list of metabolites in the appropriate biological matrix. **One validation step may not be sufficient to fully validate the results**. The route from discovery to validation may have to repeat. For example, **the validation of a biomarker to apply in clinical practice** will require a a greater level of validation and take longer to complete than **validation of a biological mechanism in yeast**. --- # Study question #### Study question will influence the experiment design. * Define the study question clearly ahead of time * **WILL** influence analytical approaches. .pull-left[ Hypothesis-testing studies (Validation) * Based on biological context + Biologically perturb the system: knock-out or enhance a metabolic reaction + Different functional level: post-translational modification; enzyme activity * Targeted: precise and accurate in the appropriate biological matrix **Translation** * Translate your discovery to the relative working environment ] .pull-right[ <img src="data:image/png;base64,#./img/hypothesis_testing.png" width="100%" style="display: block; margin: auto;" /> ] ??? A final step of a study, following discovery and validation, is to **translate your discovery to the relative working environment**. Validation and translation can take many years to complete. This is the final output of your study. The translation phase is very important, it is the output of our research to benefit the human population or your area of study, and provides the impact of our research. --- class: inverse, center, middle # Next: Experimental design, Sample Collection and Preparation Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). --- class: inverse # Resources: * [Data carpentry](https://datacarpentry.org/lessons/) + [Data Analysis and Visualization in R](https://datacarpentry.org/R-genomics/index.html) + [Data Analysis and Visualization in R for Ecologists](https://datacarpentry.org/R-ecology-lesson/index.html) * [Software carpentry](https://software-carpentry.org/lessons/) + [Programming with R](http://swcarpentry.github.io/r-novice-inflammation/) + [R for Reproducible Scientific Analysis](https://swcarpentry.github.io/r-novice-gapminder/)