Inflammatory expression profiles in bladder exstrophy smooth muscle: normalization over time

Author

Jason E Michaud, Haowen Qiu, Heather N DiCarlo, John P Gearhart

Published

March 13, 2023

Doi
Abstract
Objectives
To test the hypothesis that phenotypes in bladder exstrophy result from alterations in detrusor smooth muscle cell (SMC) gene expression.
Methods
We generated primary human bladder smooth muscle cell lines from patients with classic bladder exstrophy (CBE) undergoing newborn closure (n=6), delayed primary closure (n=5), augmentation cystoplasty (n=6), and non-CBE controls (n=3). Gene expression profiles were then created using RNA sequencing and characterized using gene set enrichment analysis (GSEA).
Results
We identified 308 differentially expressed genes in bladder exstrophy SMC when compared to controls, including 223 upregulated and 85 downregulated genes. Bladder exstrophy muscle cell lines from newborn closure and primary delayed closures shared expression changes in 159 genes. GSEA analysis revealed increased expression in the inflammatory response and alteration of genes for genitourinary development in newborn and delayed closure SMC. However, these changes were absent in SMC from older exstrophy patients after closure.
Conclusions
Bladder exstrophy SMC demonstrate gene expression changes in the inflammatory response and genitourinary development. However, gene expression profiles normalized in exstrophy SMC from older patients after closure, suggesting a normalization of exstrophy SMC over time. Our in vitro findings regarding the normalization of exstrophy SMC gene expression following bladder closure suggest that the development of poor detrusor compliance in bladder exstrophy has a complex multifactorial etiology. Taken together, our findings suggest that alterations in SMC gene expression may explain abnormalities in the exstrophy bladder seen prior to and immediately after closure and suggest that surgical closure may allow exstrophy SMC to normalize over time.
Code
suppressPackageStartupMessages(c(
  library("ggrepel"),
  library("pheatmap"),
  library("data.table"),
  library("DT"),
  library("extrafont"),
  library("RColorBrewer"),
  library("DESeq2"),
  library("readxl"),
  library("knitr"),
  library("knitcitations"),
  library("gridExtra"),
  library("DESeq2"),
  library("tidyverse"),
  library("clusterProfiler"),
  library("org.Hs.eg.db"),
  library("AnnotationDbi")
))
log2fc = 1
fdr = 0.05
heatmap_topN = 100
# get start time
start_time <- Sys.time()

Project and data background

The bladder exstrophy- epispadias- cloacal exstrophy complex (BEEC) is a rare spectrum of congenital anomalies affecting the genitourinary, musculoskeletal, and gastrointestinal systems with varying severity. Exstrophy bladders are known to have reduced smooth muscle content and increased type-III collagen deposition, while exstrophy smooth muscle cells (SMC) have reduced intracellular calcium levels and reduced contractile responses. We hypothesized that exstrophy bladder and SMC phenotypes result from alterations in detrusor SMC gene expression.

To examine potential alterations in exstrophy SMC gene expression, we generated primary SMC lines from bladder biopsies of patients with CBE at the time of surgery. We then measured global gene expression with RNA-seq of RNAs isolated from primary SMCs. Our cell lines included CBE patients undergoing newborn closure (n=6), delayed primary closure (n=5), older CBE patients undergoing bladder augmentation (enterocystoplasty) for poor bladder growth after newborn closure (n=6), and non-CBE controls undergoing ureteral reimplantation for vesicoureteral reflux (n=3).

Code
### data formatting and wrangling
count_dm = read.delim(raw_count_file, row.names=1)
for (i in 1:nrow(count_dm)) {
        count_dm$ensembl_id[i] = unlist(str_split( rownames(count_dm)[i], pattern = "\\." ))[1]
}

# duplication rownames
dup = subset(count_dm, duplicated(count_dm$ensembl_id)) %>% pull(., var = "ensembl_id")
#rownames(dup_df[dup_df$ensembl_id == dup[3],])

# select one row from duplicated rows to keep
dup_df = count_dm %>%
        dplyr::filter(ensembl_id %in% dup) %>%
        rownames_to_column(., var = "rownames") %>%
        group_by(., ensembl_id) %>%
        filter(!grepl("Y$", rownames)) %>%
        column_to_rownames(., var = "rownames")
count_dm = count_dm %>%
        dplyr::filter(!(ensembl_id %in% dup)) %>%
        bind_rows(., dup_df)

count_dm = count_dm %>%
        remove_rownames(.) %>%
        column_to_rownames(., var = "ensembl_id")
Code
sample_data = read.csv(metadata, sep = "") %>%
        mutate(Group = case_when(Group == "A" ~ "Control",
                                        Group == "B" ~ "Newborn",
                                        Group == "C" ~ "Delayed", 
                                        Group == "D"~ "Augment")) %>%
        mutate(Group = as.factor(Group)) %>%
        #mutate(Group = fct_relevel(Group, "Control", "Newborn", "Delayed", "Augment")) %>% # the same as below line
        mutate(Group = factor(Group, levels = c("Control", "Newborn", "Delayed", "Augment"))) %>%
        mutate(Diagnosis = as.factor(Diagnosis)) %>%
        mutate(Surgery = as.factor(Surgery)) %>%
        mutate(Sex = as.factor(Sex)) %>% 
        mutate(Age = as.numeric(Age))

ensembl_annot = read.delim(ensembl_annot) %>%
        dplyr::select(ensembl_gene_id, external_gene_name, description, gene_biotype, go_id ) %>%
        mutate_all(~na_if(., ''))

ensembl_annot1 = aggregate(go_id ~ ensembl_gene_id + external_gene_name + description + gene_biotype, data = ensembl_annot, paste, collapse = ", ")

interested_GO = c("GO:0001655", "GO:0001822", "GO:0005149", "GO:0007517", "GO:0008009", "GO:0030595", "GO:0034612", "GO:0042060", "GO:0042379", "GO:0048568", "GO:0048704", "GO:0050727", "GO:0055074", "GO:0070700", "GO:0071356")
Code
### run deseq2
design_formula = ~Group
ddsMat <- DESeqDataSetFromMatrix(countData = count_dm,
                                 colData = sample_data,
                                 design = design_formula)
# pre-filtering the data set by filtering rows that have no or 1 count
ddsMat <- ddsMat[ rowSums(counts(ddsMat)) > 0, ]
dds = DESeq(ddsMat)
#resultsNames(dds)

RNA-seq data processing and analysis

RNA was isolated from primary SMC, passage P3-5, after serum starving in 0.5% serum for 24hrs. mRNA-seq libraries were prepared from total RNA for sequencing on an Illumina NextSeq500 platform. Major steps of the RNA-seq data processing pipelines include:

  • FASTQC and MULTIQC for quality check

  • TrimGalore for trimming and pre-processing

  • STAR for mapping and align sequences to reference genome

  • SUBREAD for summarizing reads before and after pre-processing and alignment

A summary of alignment is provided in this table:

Code
alignment_summary = read.csv(alignment_summary) %>% janitor::clean_names()
alignment_summary %>%
        left_join(., sample_data[,1:2]) %>%
        relocate(Group, .after = sample) %>%
        knitr::kable(., digits = 3, "html") %>%
        kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
        kableExtra::scroll_box(width = "100%", height = "500px")
sample Group number_of_input_reads average_input_read_length uniquely_mapped_reads_number uniquely_mapped_reads average_mapped_length number_of_splices_total number_of_splices_annotated_sjdb number_of_splices_gt_ag number_of_splices_gc_ag number_of_splices_at_ac number_of_splices_non_canonical mismatch_rate_per_base deletion_rate_per_base deletion_average_length insertion_rate_per_base insertion_average_length number_of_reads_mapped_to_multiple_loci x_of_reads_mapped_to_multiple_loci number_of_reads_mapped_to_too_many_loci x_of_reads_mapped_to_too_many_loci number_of_reads_unmapped_too_many_mismatches x_of_reads_unmapped_too_many_mismatches number_of_reads_unmapped_too_short x_of_reads_unmapped_too_short number_of_reads_unmapped_other x_of_reads_unmapped_other number_of_chimeric_reads x_of_chimeric_reads assigned unassigned_unmapped unassigned_read_type unassigned_singleton unassigned_mapping_quality unassigned_chimera unassigned_fragment_length unassigned_duplicate unassigned_multi_mapping unassigned_secondary unassigned_non_split unassigned_no_features unassigned_overlapping_length unassigned_ambiguity
C100 Delayed 40108024 149 36736941 91.59% 148.87 22189045 22088426 22009343 147426 15877 16399 0.26% 0.01% 1.56 0.00% 1.49 2189334 5.46% 7621 0.02% 0 0.00% 1168138 2.91% 5990 0.01% 0 0.00% 34945670 0 0 0 0 0 0 0 10504889 0 0 683830 0 1107441
C106 Delayed 58861337 149 54212534 92.10% 148.92 31849284 31700065 31601331 200945 22468 24540 0.25% 0.01% 1.59 0.00% 1.49 3173073 5.39% 11084 0.02% 0 0.00% 1457908 2.48% 6738 0.01% 0 0.00% 51337554 0 0 0 0 0 0 0 14675206 0 0 1242344 0 1632636
B121 Newborn 56537095 149 52186377 92.30% 148.90 32730513 32592111 32495533 194374 18926 21680 0.25% 0.01% 1.53 0.00% 1.48 2534057 4.48% 11748 0.02% 0 0.00% 1797830 3.18% 7083 0.01% 0 0.00% 49701272 0 0 0 0 0 0 0 11184629 0 0 1013882 0 1471223
D123 Augment 40575600 148 37801048 93.16% 148.64 23992101 23895557 23818403 143958 14269 15471 0.25% 0.01% 1.56 0.00% 1.51 2278231 5.61% 8818 0.02% 0 0.00% 483994 1.19% 3509 0.01% 0 0.00% 36061134 0 0 0 0 0 0 0 10768369 0 0 683489 0 1056425
B146 Newborn 40468930 149 37258100 92.07% 148.92 21967230 21869739 21789217 143642 16342 18029 0.24% 0.01% 1.52 0.00% 1.53 2367273 5.85% 8028 0.02% 0 0.00% 831373 2.05% 4156 0.01% 0 0.00% 35387507 0 0 0 0 0 0 0 11320658 0 0 717339 0 1153254
D19 Augment 38352444 149 35818651 93.39% 149.01 22966329 22875769 22810308 128033 13191 14797 0.24% 0.01% 1.50 0.00% 1.59 1879674 4.90% 7132 0.02% 0 0.00% 643109 1.68% 3878 0.01% 0 0.00% 34168183 0 0 0 0 0 0 0 8773521 0 0 674064 0 976404
A214 Control 45033129 149 42213288 93.74% 148.98 27784904 27683395 27604042 147476 14783 18603 0.23% 0.01% 1.61 0.00% 1.66 2088147 4.64% 7898 0.02% 0 0.00% 719682 1.60% 4114 0.01% 0 0.00% 40248695 0 0 0 0 0 0 0 9373392 0 0 836303 0 1128290
A28 Control 44021059 149 40729520 92.52% 148.96 26067356 25965272 25885116 149674 15232 17334 0.24% 0.01% 1.57 0.00% 1.51 2405946 5.47% 8658 0.02% 0 0.00% 872886 1.98% 4049 0.01% 0 0.00% 38869185 0 0 0 0 0 0 0 11825136 0 0 721132 0 1139203
C331 Delayed 39424328 149 36749110 93.21% 148.98 23344609 23250484 23175774 139120 14238 15477 0.23% 0.01% 1.52 0.00% 1.63 2063050 5.23% 6323 0.02% 0 0.00% 601980 1.53% 3865 0.01% 0 0.00% 34967465 0 0 0 0 0 0 0 9230532 0 0 720892 0 1060753
D37 Augment 42926789 149 39619768 92.30% 148.95 25580288 25479357 25392695 157117 14674 15802 0.25% 0.01% 1.50 0.00% 1.55 2173461 5.06% 7332 0.02% 0 0.00% 1122295 2.61% 3933 0.01% 0 0.00% 37880425 0 0 0 0 0 0 0 10721122 0 0 618562 0 1120781
B441 Newborn 42828719 149 39587837 92.43% 148.99 21960227 21856960 21757717 169195 14159 19156 0.24% 0.01% 1.54 0.00% 1.50 2665569 6.22% 8145 0.02% 0 0.00% 562695 1.31% 4473 0.01% 0 0.00% 37545239 0 0 0 0 0 0 0 13028556 0 0 816800 0 1225798
B49 Newborn 39832639 149 36535639 91.72% 148.93 22111233 22013385 21937759 143656 13983 15835 0.24% 0.01% 1.50 0.00% 1.61 2373287 5.96% 6320 0.02% 0 0.00% 914257 2.30% 3136 0.01% 0 0.00% 34633743 0 0 0 0 0 0 0 11208078 0 0 713732 0 1188164
D596 Augment 48746410 149 45240426 92.81% 148.98 27854529 27733471 27635496 180455 18235 20343 0.23% 0.01% 1.51 0.00% 1.50 2895532 5.94% 10250 0.02% 0 0.00% 594367 1.22% 5835 0.01% 0 0.00% 42998138 0 0 0 0 0 0 0 13685103 0 0 867058 0 1375230
B61 Newborn 39683496 149 36620583 92.28% 148.93 22731024 22628485 22551115 147136 17142 15631 0.24% 0.01% 1.57 0.00% 1.51 2024489 5.10% 7063 0.02% 0 0.00% 1027236 2.59% 4125 0.01% 0 0.00% 34806194 0 0 0 0 0 0 0 9287826 0 0 720672 0 1093717
C67 Delayed 38966890 149 35594310 91.35% 148.90 21813005 21713152 21631287 150814 14745 16159 0.24% 0.01% 1.52 0.00% 1.48 2309577 5.93% 8271 0.02% 0 0.00% 1051289 2.70% 3443 0.01% 0 0.00% 33688661 0 0 0 0 0 0 0 10613039 0 0 728595 0 1177054
C6 Delayed 35349450 148 32900254 93.07% 148.61 20253503 20168132 20088410 136370 14501 14222 0.24% 0.01% 1.60 0.00% 1.65 1974647 5.59% 6223 0.02% 0 0.00% 464637 1.31% 3689 0.01% 0 0.00% 31421583 0 0 0 0 0 0 0 9687586 0 0 556167 0 922504
D72 Augment 29217505 149 27106405 92.77% 148.94 17579344 17503925 17444770 111072 11544 11958 0.23% 0.01% 1.50 0.00% 1.51 1636578 5.60% 5569 0.02% 0 0.00% 465998 1.59% 2955 0.01% 0 0.00% 25659252 0 0 0 0 0 0 0 6990743 0 0 567193 0 879960
A88 Control 42794860 149 39533000 92.38% 148.95 24499651 24395124 24316793 150943 15123 16792 0.24% 0.01% 1.52 0.00% 1.57 2441503 5.71% 8289 0.02% 0 0.00% 806803 1.89% 5265 0.01% 0 0.00% 37579372 0 0 0 0 0 0 0 11721326 0 0 813186 0 1140442
D997 Augment 36491170 149 33892776 92.88% 148.96 20894778 20807999 20744930 123221 11992 14635 0.24% 0.01% 1.52 0.00% 1.60 1963371 5.38% 7084 0.02% 0 0.00% 622866 1.71% 5073 0.01% 0 0.00% 32151125 0 0 0 0 0 0 0 8867662 0 0 750804 0 990847

We performed two downstream analysis, differential gene expression (DGE) analysis using DESeq2 and GO enrichment analysis using clusterProfiler.

Code
### generate transformed count matrix
# blind = TRUE for QA, blind = FALSE for downstream analysis
transformed_count <- rlog(ddsMat, blind = FALSE)
colnames(transformed_count) <- sample_data$sample

# change transformed_count into a matrix
transformed_count <- assay(transformed_count)

# annotation for heatmaps
anno = sample_data %>% 
  dplyr::select(Group, sample) %>% 
  column_to_rownames(var = "sample")

deseq2_dist(transformed_count, anno)

This plot shows the euclidean distance of samples and how they are hierarchically clustered using the normalized (regularized log transformed) count data generated with DESeq2.

PCA by group

Code
pca_group <- deseq2_pca(sample_data, transformed_count, 
           PC1 = "PC1", PC2 = "PC2", color_by = "Group", save = FALSE)
print(pca_group[[1]])

The PCA plot shows the first two principal components of the regularized log transformed count data for all samples (there is two few data points to calculate ellipse for Control group). There was no obvious separation between groups.

Pairwise comparisons between newborn closure, delayed closure, and augmentation groups were compared to the control group, respectively. Differentially expressed genes (DEG) were defined as having a Benjamini-Hochberg adjusted P value <0.05 and log2FC >1.0. DEG were further enrolled in GO enrichment analysis in R using clusterProfiler version 4.4.4. For gene set enrichment analysis (GSEA), the ranked gene list by log2FC was entered in a pre-ranked GSEA function in clusterProfiler, using human gene sets database (org.Hs.eg.db v3.15.0). Dot plots from GO enrichment analysis and GSEA were generated in R by enrichplot version 1.16.2. Only annotated RNAs are presented. All experimental data are available through the NCBI GEO repository.

Newborn vs Control

Code
contrast = c("Group", "Newborn", "Control")
res_unshrunken <- results(dds, contrast = contrast, alpha = 0.05)
#summary(res_unshrunken)
res <- lfcShrink(dds, contrast = contrast, res = res_unshrunken, type = "ashr")
#summary(res)

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Code
deseq2_plotMA(res)

Significant features

Note

Genes with a Benjamini-Hochberg (FDR) adjusted P value smaller than 0.05 and an absolute log2 fold change larger than 1 were considered differentially expressed.

Code
full_stat <- as.data.frame(res) %>%
        mutate(across(everything(), ~as.numeric(.x))) %>%
        mutate(abs_log2FC = abs(log2FoldChange) )


full_stat_BA = full_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames")
Code
sig_stat <- full_stat %>%
  #dplyr::rename(., gene = Gene_name) %>%
  dplyr::filter(padj < fdr & abs(log2FoldChange) > log2fc) %>%
  mutate(status = if_else(log2FoldChange < 0, "Down", "Up"))

sig_stat_BA = sig_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") %>%
        dplyr::filter(!is.na(external_gene_name) )

Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.

Code
deseq2_volcano_2(full_stat, sig_stat_BA, fdr = fdr, log2fc = log2fc, label_col = "external_gene_name")

Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.

Code
hm = transformed_count[, rownames(anno)[which(anno$Group %in% c("Control", "Newborn"))]]

deseq2_hm_2(hm, sig_stat_BA, anno, save = FALSE)

Code
deseq2_hm_2(hm, sig_stat_BA, anno, rowlabel = "external_gene_name", top_n = heatmap_topN, save = FALSE)

Pathway enrichment analysis is a way to summarize your gene list into pathways to ease biological interpretation of the data. The analysis uses the overlap between your gene list and a pathway to calculate pathway enrichment score.

There are two types of gene lists, corresponding to two tools for enrichment analysis:

  • Defined gene list (e.g., expression change > 2-fold, significantly differentiated genes only): over-representation analysis
    • Answers the question: Are any pathways (gene sets) surprisingly enriched in my gene list?
    • Statistical test: Fisher’s exact test
  • Ranked gene list (e.g., by differential gene expression): GSEA
    • Answers the question: Are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?
    • Statistical test: GSEA
Note

Ranked gene list is preferred and recommended so as to avoid arbitrary threshold.

There are many pathway databases for enrichment analysis, such as GO, KEGG, WikiPathways, Reactome, etc. In this section, we are using GO (gene ontology) database.

Note

Over representation analysis needs two things as input:

  • A defined gene list
  • Gene sets (pathways) or annotations: e.g. GO database.

The question ORA is trying to answer is: are any of the gene sets (pathways) surprisingly enriched in the gene list?

Code
genes_to_test = rownames(sig_stat)

GO_results <- clusterProfiler::enrichGO(gene = genes_to_test, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff  = 0.01,qvalueCutoff = 0.05, readable=TRUE)
Code
#as.data.frame(GO_results) %>% head()
GO_results_tidy = GO_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num)) 
#GO_results_tidy %>% create_dt()

Dotplot

Code
GO_results = GO_results %>% 
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num))
dotplot(GO_results, showCategory = 20)

Note

Dotplot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (e.g. p values) and gene count or ratio as dot size and color.

Above graph is showing top 20 enriched terms.

EnrichmentMap

Code
ORA = enrichplot::pairwise_termsim(GO_results) 

ORA %>% enrichplot::emapplot()

Note

Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module.

Default is showing 30 enriched terms.

Note

GSEA needs two things as input:

  • A ranked gene list

  • Gene Sets (pathways) or annotations: e.g. GO database.

The analysis answers the question: are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?

Code
geneList = res_unshrunken$log2FoldChange
names(geneList) = rownames(res_unshrunken)

geneList = na.omit(geneList) %>% sort(., decreasing = TRUE)

GSE_results = gseGO(gene = geneList, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff = 0.05)
Code
GSE_results_tidy = GSE_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        dplyr::arrange(desc(NES))
        
#GSE_results_tidy %>% create_dt()

Dotplot

Top 20 enriched terms in both activated and suppressed categories

Code
df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + facet_grid(.~.sign, scales = "free_x")

Code
#dotplot(GSE_results, split=".sign") + facet_grid(.~.sign)
Code
df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        #group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) #+ facet_grid(.~.sign)
#dotplot(GSE_results, split=".sign", order = T) + facet_grid(.~.sign)
Code
# This section is for making dotplot for selected GO terms
selected_GSE_results = GSE_results %>%
        dplyr::filter(ID %in% interested_GO)

selected_GSE_results_tidy = GSE_results_tidy %>%
        dplyr::filter(ID %in% interested_GO)
# newborn
df <- ggplot2::fortify(selected_GSE_results, showCategory = nrow(selected_GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx_newborn <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                         levels=rev(unique(df$Description[idx_newborn])))

newborn = ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
                               guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + xlim(1.5, 2.4) + facet_grid(.~.sign) + theme(legend.position = "none")

newborn_order = df$ID

EnrichmentMap

Code
GSE = enrichplot::pairwise_termsim(GSE_results) 

GSE %>% emapplot()

Delayed vs Control

Code
contrast = c("Group", "Delayed", "Control")
res_unshrunken <- results(dds, contrast = contrast, alpha = 0.05)
#summary(res_unshrunken)
res <- lfcShrink(dds, contrast = contrast, res = res_unshrunken, type = "ashr")
#summary(res)

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Code
deseq2_plotMA(res)

Significant features

Note

Genes with a Benjamini-Hochberg (FDR) adjusted P value smaller than 0.05 and an absolute log2 fold change larger than 1 were considered differentially expressed.

Code
full_stat <- as.data.frame(res) %>%
        mutate(across(everything(), ~as.numeric(.x))) %>%
        mutate(abs_log2FC = abs(log2FoldChange) )

full_stat_CA = full_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") 
Code
sig_stat <- full_stat %>%
  #dplyr::rename(., gene = Gene_name) %>%
  dplyr::filter(padj < fdr & abs(log2FoldChange) > log2fc) %>%
  mutate(status = if_else(log2FoldChange < 0, "Down", "Up"))

sig_stat_CA = sig_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") %>%
        dplyr::filter(!is.na(external_gene_name) )

Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.

Code
deseq2_volcano_2(full_stat, sig_stat_CA, fdr = fdr, log2fc = log2fc, label_col = "external_gene_name")

Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.

Code
hm = transformed_count[, rownames(anno)[which(anno$Group %in% c("Control", "Delayed"))]]
deseq2_hm_2(hm, sig_stat_CA, anno, save = FALSE)

Code
deseq2_hm_2(hm, sig_stat_CA, anno, rowlabel = "external_gene_name", top_n = heatmap_topN, save = FALSE)
Note

Over representation analysis needs two things as input:

  • A defined gene list
  • Gene sets (pathways) or annotations: e.g. GO database.

The question ORA is trying to answer is: are any of the gene sets (pathways) surprisingly enriched in the gene list?

Code
genes_to_test = rownames(sig_stat)
GO_results <- enrichGO(gene = genes_to_test, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff  = 0.01,qvalueCutoff = 0.05, readable=TRUE)
Code
#as.data.frame(GO_results) %>% head()
GO_results_tidy = GO_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num)) 
#GO_results_tidy %>% create_dt()

Dotplot

Code
GO_results = GO_results %>% 
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num))
dotplot(GO_results, showCategory = 20)

EnrichmentMap

Code
ORA = enrichplot::pairwise_termsim(GO_results) 

ORA %>% enrichplot::emapplot()

Note

GSEA needs two things as input:

  • A ranked gene list

  • Gene Sets (pathways) or annotations: e.g. GO database.

The analysis answers the question: are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?

Code
geneList = res_unshrunken$log2FoldChange
names(geneList) = rownames(res_unshrunken)

geneList = na.omit(geneList) %>% sort(., decreasing = TRUE)

GSE_results = gseGO(gene = geneList, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff = 0.05)
Code
GSE_results_tidy = GSE_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        dplyr::arrange(desc(NES))
        
#GSE_results_tidy %>% create_dt()

Dotplot

Top 20 enriched terms in both activated and suppressed categories

Code
df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + facet_grid(.~.sign, scales = "free_x")

Code
#dotplot(GSE_results, split=".sign") + facet_grid(.~.sign)
Code
df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        #group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) #+ facet_grid(.~.sign)
#dotplot(GSE_results, split=".sign") + facet_grid(.~.sign)
Code
# This section is for making dotplot for selected GO terms
selected_GSE_results = GSE_results %>%
        dplyr::filter(ID %in% interested_GO)

selected_GSE_results_tidy = GSE_results_tidy %>%
        dplyr::filter(ID %in% interested_GO)
# delayed
df <- ggplot2::fortify(selected_GSE_results, showCategory = nrow(selected_GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(match(ID, newborn_order))


df$Description <- factor(df$Description,
                         levels=rev(unique(df$Description[idx_newborn])))


delayed = ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
                               guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + xlim(1.5, 2.4) + facet_grid(.~.sign)

EnrichmentMap

Code
GSE = enrichplot::pairwise_termsim(GSE_results) 

GSE %>% emapplot()

Augumentation vs Control

Code
contrast = c("Group", "Augment", "Control")
res_unshrunken <- results(dds, contrast = contrast, alpha = 0.05)
#summary(res_unshrunken)
res <- lfcShrink(dds, contrast = contrast, res = res_unshrunken, type = "ashr")
#summary(res)

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Code
deseq2_plotMA(res)

Significant features

Code
full_stat <- as.data.frame(res) %>%
        mutate(across(everything(), ~as.numeric(.x))) %>%
        mutate(abs_log2FC = abs(log2FoldChange) )

full_stat_DA = full_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames")
Code
sig_stat <- full_stat %>%
  #dplyr::rename(., gene = Gene_name) %>%
  dplyr::filter(padj < fdr & abs(log2FoldChange) > log2fc) %>%
  mutate(status = if_else(log2FoldChange < 0, "Down", "Up"))

sig_stat_DA = sig_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") %>%
        dplyr::filter(!is.na(external_gene_name) )

Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.

Code
deseq2_volcano_2(full_stat, sig_stat_DA, fdr = fdr, log2fc = log2fc, label_col = "external_gene_name")

Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.

Code
hm = transformed_count[, rownames(anno)[which(anno$Group %in% c("Control", "Augment"))]]
deseq2_hm_2(hm, sig_stat_DA, anno, rowlabel = "external_gene_name", top_n = 3, save = FALSE)

Due to the small number of genes significantly differentiated between groups, no enrichment analysis can be performed.

Conclusion

In our expression analysis, we identified significant alterations in exstrophy SMC gene expression, with 309 DEG in exstrophy SMC undergoing newborn closure compared to controls, including 223 upregulated and 86 downregulated genes. We then examined gene expression profiles of exstrophy SMC from patients born with small bladder templates who underwent delayed closure. We identified 297 DEG in delayed closure SMC compared to controls, including 229 upregulated and 68 downregulated genes. Despite being born with smaller templates and undergoing later exstrophy closure, the gene expression profiles of delayed closure SMC were not significantly different from newborn closure SMC. In fact, the gene expression profiles of newborn and delayed closures shared 159 DEG.

Code
sig_list_BA = rownames(sig_stat_BA)
sig_list_CA = rownames(sig_stat_CA)
sig_list_DA = rownames(sig_stat_DA)
common = intersect(sig_list_BA, sig_list_CA)

common_anno = ensembl_annot1 %>%
        dplyr::filter(ensembl_gene_id %in% common) %>%
        #remove_rownames(.,) %>%
        column_to_rownames(., var = "ensembl_gene_id")

An overall heatmap using these 159 common genes:

Code
deseq2_hm_3(transformed_count, common_anno, anno, rowlabel = "external_gene_name", rowname_switch = TRUE, save = FALSE)

We next analyzed gene expression in SMC from older CBE patients, biopsied at the time of bladder augmentation (enterocystoplasty). Interestingly, we identified only 3 DEG among SMC expression profiles from older CBE patients undergoing augmentation, as they were not significantly different from controls. DEG shared among all exstrophy SMC included upregulation of the chemokine CXCL8, upregulation of the endoplasmic reticulum protein reticulon 1 (RTN1), and downregulation of C1q and TNF-related 3 (C1QTNF3) genes. Clustering of expression profiles revealed augment SMC were most like controls, whereas newborn SMC were similar to delayed SMC. Together, these data suggest a shift of SMC expression to a non-exstrophy phenotype after closure, despite persistence of bladder pathology with lack of capacity.

Venn diagram of common genes of three contrasts

Code
make_venn_3(list("Newborn" = sig_list_BA, "Delayed" = sig_list_CA, "Augment" = sig_list_DA))

GSEA dotplot for selected GO terms in “Newborn” and “Delayed” contrasts

To better understand the implications of altered gene expression in exstrophy SMC, we performed gene set enrichment analysis (GSEA) organizing DEG into gene ontology (GO) pathways representative of cellular processes. GSEA confirmed that relevant GO pathways were altered in exstrophy SMC from newborn and delayed closures. Importantly, GSEA revealed that exstrophy SMC predominantly over-expressed inflammatory pathways including genes involved in chemokine signaling and the acute phase response. The inflammatory gene profile was present in both newborn and delayed closure SMC.

Code
egg::ggarrange(newborn, delayed, labels = c("Newborn", "Delayed"), nrow = 1,
               label.args = list(gp = grid::gpar(fontface = 2)) )

Reproducibility

The amount of time took to generate the report:

Code
Sys.time() - start_time
Time difference of 5.730587 mins

R session information:

Code
sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] plotly_4.10.1               VennDiagram_1.7.3          
 [3] futile.logger_1.4.3         wesanderson_0.3.6          
 [5] org.Hs.eg.db_3.15.0         AnnotationDbi_1.58.0       
 [7] clusterProfiler_4.4.4       lubridate_1.9.2            
 [9] forcats_1.0.0               stringr_1.5.0              
[11] dplyr_1.1.2                 purrr_1.0.1                
[13] readr_2.1.4                 tidyr_1.3.0                
[15] tibble_3.2.1                tidyverse_2.0.0            
[17] gridExtra_2.3               knitcitations_1.0.12       
[19] knitr_1.43.1                readxl_1.4.2               
[21] DESeq2_1.36.0               SummarizedExperiment_1.26.1
[23] Biobase_2.56.0              MatrixGenerics_1.8.1       
[25] matrixStats_0.63.0          GenomicRanges_1.48.0       
[27] GenomeInfoDb_1.32.4         IRanges_2.30.1             
[29] S4Vectors_0.34.0            BiocGenerics_0.42.0        
[31] RColorBrewer_1.1-3          extrafont_0.19             
[33] DT_0.28                     data.table_1.14.8          
[35] pheatmap_1.0.12             ggrepel_0.9.3              
[37] ggplot2_3.4.2              

loaded via a namespace (and not attached):
  [1] snow_0.4-4             shadowtext_0.1.2       backports_1.4.1       
  [4] fastmatch_1.1-3        systemfonts_1.0.4      plyr_1.8.8            
  [7] igraph_1.5.1           lazyeval_0.2.2         splines_4.2.1         
 [10] BiocParallel_1.30.4    digest_0.6.31          invgamma_1.1          
 [13] yulab.utils_0.0.6      htmltools_0.5.5        GOSemSim_2.22.0       
 [16] viridis_0.6.3          GO.db_3.15.0           SQUAREM_2021.1        
 [19] fansi_1.0.4            magrittr_2.0.3         memoise_2.0.1         
 [22] tzdb_0.4.0             Biostrings_2.64.1      annotate_1.74.0       
 [25] graphlayouts_1.0.0     svglite_2.1.1          extrafontdb_1.0       
 [28] timechange_0.2.0       enrichplot_1.16.2      colorspace_2.1-0      
 [31] rvest_1.0.3            blob_1.2.4             xfun_0.39             
 [34] crayon_1.5.2           RCurl_1.98-1.12        jsonlite_1.8.4        
 [37] scatterpie_0.2.0       genefilter_1.78.0      survival_3.5-5        
 [40] ape_5.7-1              glue_1.6.2             kableExtra_1.3.4      
 [43] polyclip_1.10-4        gtable_0.3.3           zlibbioc_1.42.0       
 [46] XVector_0.36.0         webshot_0.5.4          DelayedArray_0.22.0   
 [49] Rttf2pt1_1.3.12        scales_1.2.1           DOSE_3.22.1           
 [52] futile.options_1.0.1   DBI_1.1.3              bibtex_0.5.1          
 [55] Rcpp_1.0.10            viridisLite_0.4.2      xtable_1.8-4          
 [58] gridGraphics_0.5-1     tidytree_0.4.2         bit_4.0.5             
 [61] truncnorm_1.0-9        htmlwidgets_1.6.2      httr_1.4.6            
 [64] fgsea_1.22.0           pkgconfig_2.0.3        XML_3.99-0.14         
 [67] farver_2.1.1           janitor_2.2.0          locfit_1.5-9.7        
 [70] utf8_1.2.3             labeling_0.4.2         ggplotify_0.1.0       
 [73] tidyselect_1.2.0       rlang_1.1.1            reshape2_1.4.4        
 [76] munsell_0.5.0          cellranger_1.1.0       tools_4.2.1           
 [79] cachem_1.0.8           downloader_0.4         cli_3.6.1             
 [82] generics_0.1.3         RSQLite_2.3.1          evaluate_0.21         
 [85] fastmap_1.1.1          yaml_2.3.7             RefManageR_1.4.0      
 [88] ggtree_3.4.4           bit64_4.0.5            tidygraph_1.2.3       
 [91] KEGGREST_1.36.3        ggraph_2.1.0           nlme_3.1-162          
 [94] formatR_1.14           aplot_0.1.10           DO.db_2.9             
 [97] xml2_1.3.4             compiler_4.2.1         rstudioapi_0.14       
[100] png_0.1-8              treeio_1.20.2          tweenr_2.0.2          
[103] geneplotter_1.74.0     stringi_1.7.12         highr_0.10            
[106] lattice_0.21-8         Matrix_1.5-4           vctrs_0.6.2           
[109] pillar_1.9.0           lifecycle_1.0.3        irlba_2.3.5.1         
[112] bitops_1.0-7           patchwork_1.1.2        qvalue_2.28.0         
[115] R6_2.5.1               codetools_0.2-19       lambda.r_1.2.4        
[118] MASS_7.3-60            withr_2.5.0            GenomeInfoDbData_1.2.8
[121] parallel_4.2.1         hms_1.1.3              ggfun_0.0.9           
[124] snakecase_0.11.0       rmarkdown_2.21         ashr_2.2-54           
[127] mixsqp_0.3-48          ggforce_0.4.1