Inflammatory expression profiles in bladder exstrophy smooth muscle: normalization over time

Author

Jason E Michaud, Haowen Qiu, Heather N DiCarlo, John P Gearhart

Published

March 13, 2023

Doi

https://doi.org/10.1016/j.urology.2023.02.042

Abstract

Objectives
To test the hypothesis that phenotypes in bladder exstrophy result from alterations in detrusor smooth muscle cell (SMC) gene expression.
Methods
We generated primary human bladder smooth muscle cell lines from patients with classic bladder exstrophy (CBE) undergoing newborn closure (n=6), delayed primary closure (n=5), augmentation cystoplasty (n=6), and non-CBE controls (n=3). Gene expression profiles were then created using RNA sequencing and characterized using gene set enrichment analysis (GSEA).
Results
We identified 308 differentially expressed genes in bladder exstrophy SMC when compared to controls, including 223 upregulated and 85 downregulated genes. Bladder exstrophy muscle cell lines from newborn closure and primary delayed closures shared expression changes in 159 genes. GSEA analysis revealed increased expression in the inflammatory response and alteration of genes for genitourinary development in newborn and delayed closure SMC. However, these changes were absent in SMC from older exstrophy patients after closure.
Conclusions
Bladder exstrophy SMC demonstrate gene expression changes in the inflammatory response and genitourinary development. However, gene expression profiles normalized in exstrophy SMC from older patients after closure, suggesting a normalization of exstrophy SMC over time. Our in vitro findings regarding the normalization of exstrophy SMC gene expression following bladder closure suggest that the development of poor detrusor compliance in bladder exstrophy has a complex multifactorial etiology. Taken together, our findings suggest that alterations in SMC gene expression may explain abnormalities in the exstrophy bladder seen prior to and immediately after closure and suggest that surgical closure may allow exstrophy SMC to normalize over time.

Code

suppressPackageStartupMessages(c(
  library("ggrepel"),
  library("pheatmap"),
  library("data.table"),
  library("DT"),
  library("extrafont"),
  library("RColorBrewer"),
  library("DESeq2"),
  library("readxl"),
  library("knitr"),
  library("knitcitations"),
  library("gridExtra"),
  library("DESeq2"),
  library("tidyverse"),
  library("clusterProfiler"),
  library("org.Hs.eg.db"),
  library("AnnotationDbi")
))
log2fc = 1
fdr = 0.05
heatmap_topN = 100
# get start time
start_time <- Sys.time()

Project and data background

The bladder exstrophy- epispadias- cloacal exstrophy complex (BEEC) is a rare spectrum of congenital anomalies affecting the genitourinary, musculoskeletal, and gastrointestinal systems with varying severity. Exstrophy bladders are known to have reduced smooth muscle content and increased type-III collagen deposition, while exstrophy smooth muscle cells (SMC) have reduced intracellular calcium levels and reduced contractile responses. We hypothesized that exstrophy bladder and SMC phenotypes result from alterations in detrusor SMC gene expression.

To examine potential alterations in exstrophy SMC gene expression, we generated primary SMC lines from bladder biopsies of patients with CBE at the time of surgery. We then measured global gene expression with RNA-seq of RNAs isolated from primary SMCs. Our cell lines included CBE patients undergoing newborn closure (n=6), delayed primary closure (n=5), older CBE patients undergoing bladder augmentation (enterocystoplasty) for poor bladder growth after newborn closure (n=6), and non-CBE controls undergoing ureteral reimplantation for vesicoureteral reflux (n=3).

Code

### data formatting and wrangling
count_dm = read.delim(raw_count_file, row.names=1)
for (i in 1:nrow(count_dm)) {
        count_dm$ensembl_id[i] = unlist(str_split( rownames(count_dm)[i], pattern = "\\." ))[1]
}

# duplication rownames
dup = subset(count_dm, duplicated(count_dm$ensembl_id)) %>% pull(., var = "ensembl_id")
#rownames(dup_df[dup_df$ensembl_id == dup[3],])

# select one row from duplicated rows to keep
dup_df = count_dm %>%
        dplyr::filter(ensembl_id %in% dup) %>%
        rownames_to_column(., var = "rownames") %>%
        group_by(., ensembl_id) %>%
        filter(!grepl("Y$", rownames)) %>%
        column_to_rownames(., var = "rownames")
count_dm = count_dm %>%
        dplyr::filter(!(ensembl_id %in% dup)) %>%
        bind_rows(., dup_df)

count_dm = count_dm %>%
        remove_rownames(.) %>%
        column_to_rownames(., var = "ensembl_id")

Code

sample_data = read.csv(metadata, sep = "") %>%
        mutate(Group = case_when(Group == "A" ~ "Control",
                                        Group == "B" ~ "Newborn",
                                        Group == "C" ~ "Delayed", 
                                        Group == "D"~ "Augment")) %>%
        mutate(Group = as.factor(Group)) %>%
        #mutate(Group = fct_relevel(Group, "Control", "Newborn", "Delayed", "Augment")) %>% # the same as below line
        mutate(Group = factor(Group, levels = c("Control", "Newborn", "Delayed", "Augment"))) %>%
        mutate(Diagnosis = as.factor(Diagnosis)) %>%
        mutate(Surgery = as.factor(Surgery)) %>%
        mutate(Sex = as.factor(Sex)) %>% 
        mutate(Age = as.numeric(Age))

ensembl_annot = read.delim(ensembl_annot) %>%
        dplyr::select(ensembl_gene_id, external_gene_name, description, gene_biotype, go_id ) %>%
        mutate_all(~na_if(., ''))

ensembl_annot1 = aggregate(go_id ~ ensembl_gene_id + external_gene_name + description + gene_biotype, data = ensembl_annot, paste, collapse = ", ")

interested_GO = c("GO:0001655", "GO:0001822", "GO:0005149", "GO:0007517", "GO:0008009", "GO:0030595", "GO:0034612", "GO:0042060", "GO:0042379", "GO:0048568", "GO:0048704", "GO:0050727", "GO:0055074", "GO:0070700", "GO:0071356")

Code

### run deseq2
design_formula = ~Group
ddsMat <- DESeqDataSetFromMatrix(countData = count_dm,
                                 colData = sample_data,
                                 design = design_formula)
# pre-filtering the data set by filtering rows that have no or 1 count
ddsMat <- ddsMat[ rowSums(counts(ddsMat)) > 0, ]
dds = DESeq(ddsMat)
#resultsNames(dds)

RNA-seq data processing and analysis

RNA was isolated from primary SMC, passage P3-5, after serum starving in 0.5% serum for 24hrs. mRNA-seq libraries were prepared from total RNA for sequencing on an Illumina NextSeq500 platform. Major steps of the RNA-seq data processing pipelines include:

FASTQC and MULTIQC for quality check
TrimGalore for trimming and pre-processing
STAR for mapping and align sequences to reference genome
SUBREAD for summarizing reads before and after pre-processing and alignment

A summary of alignment is provided in this table:

Code

alignment_summary = read.csv(alignment_summary) %>% janitor::clean_names()
alignment_summary %>%
        left_join(., sample_data[,1:2]) %>%
        relocate(Group, .after = sample) %>%
        knitr::kable(., digits = 3, "html") %>%
        kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
        kableExtra::scroll_box(width = "100%", height = "500px")

sample	Group	number_of_input_reads	average_input_read_length	uniquely_mapped_reads_number	uniquely_mapped_reads	average_mapped_length	number_of_splices_total	number_of_splices_annotated_sjdb	number_of_splices_gt_ag	number_of_splices_gc_ag	number_of_splices_at_ac	number_of_splices_non_canonical	mismatch_rate_per_base	deletion_rate_per_base	deletion_average_length	insertion_rate_per_base	insertion_average_length	number_of_reads_mapped_to_multiple_loci	x_of_reads_mapped_to_multiple_loci	number_of_reads_mapped_to_too_many_loci	x_of_reads_mapped_to_too_many_loci	x_of_reads_unmapped_too_many_mismatches	number_of_reads_unmapped_too_short	x_of_reads_unmapped_too_short	number_of_reads_unmapped_other	x_of_reads_unmapped_other	x_of_chimeric_reads	assigned	unassigned_multi_mapping	unassigned_no_features	unassigned_ambiguity
C100	Delayed	40108024	149	36736941	91.59%	148.87	22189045	22088426	22009343	147426	15877	16399	0.26%	0.01%	1.56	0.00%	1.49	2189334	5.46%	7621	0.02%	0.00%	1168138	2.91%	5990	0.01%	0.00%	34945670	10504889	683830	1107441
C106	Delayed	58861337	149	54212534	92.10%	148.92	31849284	31700065	31601331	200945	22468	24540	0.25%	0.01%	1.59	0.00%	1.49	3173073	5.39%	11084	0.02%	0.00%	1457908	2.48%	6738	0.01%	0.00%	51337554	14675206	1242344	1632636
B121	Newborn	56537095	149	52186377	92.30%	148.90	32730513	32592111	32495533	194374	18926	21680	0.25%	0.01%	1.53	0.00%	1.48	2534057	4.48%	11748	0.02%	0.00%	1797830	3.18%	7083	0.01%	0.00%	49701272	11184629	1013882	1471223
D123	Augment	40575600	148	37801048	93.16%	148.64	23992101	23895557	23818403	143958	14269	15471	0.25%	0.01%	1.56	0.00%	1.51	2278231	5.61%	8818	0.02%	0.00%	483994	1.19%	3509	0.01%	0.00%	36061134	10768369	683489	1056425
B146	Newborn	40468930	149	37258100	92.07%	148.92	21967230	21869739	21789217	143642	16342	18029	0.24%	0.01%	1.52	0.00%	1.53	2367273	5.85%	8028	0.02%	0.00%	831373	2.05%	4156	0.01%	0.00%	35387507	11320658	717339	1153254
D19	Augment	38352444	149	35818651	93.39%	149.01	22966329	22875769	22810308	128033	13191	14797	0.24%	0.01%	1.50	0.00%	1.59	1879674	4.90%	7132	0.02%	0.00%	643109	1.68%	3878	0.01%	0.00%	34168183	8773521	674064	976404
A214	Control	45033129	149	42213288	93.74%	148.98	27784904	27683395	27604042	147476	14783	18603	0.23%	0.01%	1.61	0.00%	1.66	2088147	4.64%	7898	0.02%	0.00%	719682	1.60%	4114	0.01%	0.00%	40248695	9373392	836303	1128290
A28	Control	44021059	149	40729520	92.52%	148.96	26067356	25965272	25885116	149674	15232	17334	0.24%	0.01%	1.57	0.00%	1.51	2405946	5.47%	8658	0.02%	0.00%	872886	1.98%	4049	0.01%	0.00%	38869185	11825136	721132	1139203
C331	Delayed	39424328	149	36749110	93.21%	148.98	23344609	23250484	23175774	139120	14238	15477	0.23%	0.01%	1.52	0.00%	1.63	2063050	5.23%	6323	0.02%	0.00%	601980	1.53%	3865	0.01%	0.00%	34967465	9230532	720892	1060753
D37	Augment	42926789	149	39619768	92.30%	148.95	25580288	25479357	25392695	157117	14674	15802	0.25%	0.01%	1.50	0.00%	1.55	2173461	5.06%	7332	0.02%	0.00%	1122295	2.61%	3933	0.01%	0.00%	37880425	10721122	618562	1120781
B441	Newborn	42828719	149	39587837	92.43%	148.99	21960227	21856960	21757717	169195	14159	19156	0.24%	0.01%	1.54	0.00%	1.50	2665569	6.22%	8145	0.02%	0.00%	562695	1.31%	4473	0.01%	0.00%	37545239	13028556	816800	1225798
B49	Newborn	39832639	149	36535639	91.72%	148.93	22111233	22013385	21937759	143656	13983	15835	0.24%	0.01%	1.50	0.00%	1.61	2373287	5.96%	6320	0.02%	0.00%	914257	2.30%	3136	0.01%	0.00%	34633743	11208078	713732	1188164
D596	Augment	48746410	149	45240426	92.81%	148.98	27854529	27733471	27635496	180455	18235	20343	0.23%	0.01%	1.51	0.00%	1.50	2895532	5.94%	10250	0.02%	0.00%	594367	1.22%	5835	0.01%	0.00%	42998138	13685103	867058	1375230
B61	Newborn	39683496	149	36620583	92.28%	148.93	22731024	22628485	22551115	147136	17142	15631	0.24%	0.01%	1.57	0.00%	1.51	2024489	5.10%	7063	0.02%	0.00%	1027236	2.59%	4125	0.01%	0.00%	34806194	9287826	720672	1093717
C67	Delayed	38966890	149	35594310	91.35%	148.90	21813005	21713152	21631287	150814	14745	16159	0.24%	0.01%	1.52	0.00%	1.48	2309577	5.93%	8271	0.02%	0.00%	1051289	2.70%	3443	0.01%	0.00%	33688661	10613039	728595	1177054
C6	Delayed	35349450	148	32900254	93.07%	148.61	20253503	20168132	20088410	136370	14501	14222	0.24%	0.01%	1.60	0.00%	1.65	1974647	5.59%	6223	0.02%	0.00%	464637	1.31%	3689	0.01%	0.00%	31421583	9687586	556167	922504
D72	Augment	29217505	149	27106405	92.77%	148.94	17579344	17503925	17444770	111072	11544	11958	0.23%	0.01%	1.50	0.00%	1.51	1636578	5.60%	5569	0.02%	0.00%	465998	1.59%	2955	0.01%	0.00%	25659252	6990743	567193	879960
A88	Control	42794860	149	39533000	92.38%	148.95	24499651	24395124	24316793	150943	15123	16792	0.24%	0.01%	1.52	0.00%	1.57	2441503	5.71%	8289	0.02%	0.00%	806803	1.89%	5265	0.01%	0.00%	37579372	11721326	813186	1140442
D997	Augment	36491170	149	33892776	92.88%	148.96	20894778	20807999	20744930	123221	11992	14635	0.24%	0.01%	1.52	0.00%	1.60	1963371	5.38%	7084	0.02%	0.00%	622866	1.71%	5073	0.01%	0.00%	32151125	8867662	750804	990847

We performed two downstream analysis, differential gene expression (DGE) analysis using DESeq2 and GO enrichment analysis using clusterProfiler.

Sample-to-sample distances
PCA

Code

### generate transformed count matrix
# blind = TRUE for QA, blind = FALSE for downstream analysis
transformed_count <- rlog(ddsMat, blind = FALSE)
colnames(transformed_count) <- sample_data$sample

# change transformed_count into a matrix
transformed_count <- assay(transformed_count)

# annotation for heatmaps
anno = sample_data %>% 
  dplyr::select(Group, sample) %>% 
  column_to_rownames(var = "sample")

deseq2_dist(transformed_count, anno)

This plot shows the euclidean distance of samples and how they are hierarchically clustered using the normalized (regularized log transformed) count data generated with DESeq2.

PCA by group

Code

pca_group <- deseq2_pca(sample_data, transformed_count, 
           PC1 = "PC1", PC2 = "PC2", color_by = "Group", save = FALSE)
print(pca_group[[1]])

The PCA plot shows the first two principal components of the regularized log transformed count data for all samples (there is two few data points to calculate ellipse for Control group). There was no obvious separation between groups.

Pairwise comparisons between newborn closure, delayed closure, and augmentation groups were compared to the control group, respectively. Differentially expressed genes (DEG) were defined as having a Benjamini-Hochberg adjusted P value <0.05 and log2FC >1.0. DEG were further enrolled in GO enrichment analysis in R using clusterProfiler version 4.4.4. For gene set enrichment analysis (GSEA), the ranked gene list by log2FC was entered in a pre-ranked GSEA function in clusterProfiler, using human gene sets database (org.Hs.eg.db v3.15.0). Dot plots from GO enrichment analysis and GSEA were generated in R by enrichplot version 1.16.2. Only annotated RNAs are presented. All experimental data are available through the NCBI GEO repository.

Newborn vs Control

Differential gene expression analysis using DESeq2
GO enrichment analysis

Code

contrast = c("Group", "Newborn", "Control")
res_unshrunken <- results(dds, contrast = contrast, alpha = 0.05)
#summary(res_unshrunken)
res <- lfcShrink(dds, contrast = contrast, res = res_unshrunken, type = "ashr")
#summary(res)

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Code

deseq2_plotMA(res)

Significant features

Note

Genes with a Benjamini-Hochberg (FDR) adjusted P value smaller than 0.05 and an absolute log2 fold change larger than 1 were considered differentially expressed.

Code

full_stat <- as.data.frame(res) %>%
        mutate(across(everything(), ~as.numeric(.x))) %>%
        mutate(abs_log2FC = abs(log2FoldChange) )


full_stat_BA = full_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames")

Code

sig_stat <- full_stat %>%
  #dplyr::rename(., gene = Gene_name) %>%
  dplyr::filter(padj < fdr & abs(log2FoldChange) > log2fc) %>%
  mutate(status = if_else(log2FoldChange < 0, "Down", "Up"))

sig_stat_BA = sig_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") %>%
        dplyr::filter(!is.na(external_gene_name) )

Volcano plot
Heatmap

Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.

Code

deseq2_volcano_2(full_stat, sig_stat_BA, fdr = fdr, log2fc = log2fc, label_col = "external_gene_name")

Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.

Code

hm = transformed_count[, rownames(anno)[which(anno$Group %in% c("Control", "Newborn"))]]

deseq2_hm_2(hm, sig_stat_BA, anno, save = FALSE)

Code

deseq2_hm_2(hm, sig_stat_BA, anno, rowlabel = "external_gene_name", top_n = heatmap_topN, save = FALSE)

Pathway enrichment analysis is a way to summarize your gene list into pathways to ease biological interpretation of the data. The analysis uses the overlap between your gene list and a pathway to calculate pathway enrichment score.

There are two types of gene lists, corresponding to two tools for enrichment analysis:

Defined gene list (e.g., expression change > 2-fold, significantly differentiated genes only): over-representation analysis
- Answers the question: Are any pathways (gene sets) surprisingly enriched in my gene list?
- Statistical test: Fisher’s exact test
Ranked gene list (e.g., by differential gene expression): GSEA
- Answers the question: Are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?
- Statistical test: GSEA

Note

Ranked gene list is preferred and recommended so as to avoid arbitrary threshold.

There are many pathway databases for enrichment analysis, such as GO, KEGG, WikiPathways, Reactome, etc. In this section, we are using GO (gene ontology) database.

Over-representation analysis (ORA)
Gene set enrichment analysis (GSEA)

Note

Over representation analysis needs two things as input:

A defined gene list
Gene sets (pathways) or annotations: e.g. GO database.

The question ORA is trying to answer is: are any of the gene sets (pathways) surprisingly enriched in the gene list?

Code

genes_to_test = rownames(sig_stat)

GO_results <- clusterProfiler::enrichGO(gene = genes_to_test, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff  = 0.01,qvalueCutoff = 0.05, readable=TRUE)

Code

#as.data.frame(GO_results) %>% head()
GO_results_tidy = GO_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num)) 
#GO_results_tidy %>% create_dt()

Dotplot

Code

GO_results = GO_results %>% 
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num))
dotplot(GO_results, showCategory = 20)

Note

Dotplot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (e.g. p values) and gene count or ratio as dot size and color.

Above graph is showing top 20 enriched terms.

EnrichmentMap

Code

ORA = enrichplot::pairwise_termsim(GO_results) 

ORA %>% enrichplot::emapplot()

Note

Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module.

Default is showing 30 enriched terms.

Note

GSEA needs two things as input:

A ranked gene list
Gene Sets (pathways) or annotations: e.g. GO database.

The analysis answers the question: are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?

Code

geneList = res_unshrunken$log2FoldChange
names(geneList) = rownames(res_unshrunken)

geneList = na.omit(geneList) %>% sort(., decreasing = TRUE)

GSE_results = gseGO(gene = geneList, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff = 0.05)

Code

GSE_results_tidy = GSE_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        dplyr::arrange(desc(NES))
        
#GSE_results_tidy %>% create_dt()

Dotplot

Top 20 enriched terms in both activated and suppressed categories

Code

df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + facet_grid(.~.sign, scales = "free_x")

Code

#dotplot(GSE_results, split=".sign") + facet_grid(.~.sign)

Code

df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        #group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) #+ facet_grid(.~.sign)
#dotplot(GSE_results, split=".sign", order = T) + facet_grid(.~.sign)

Code

# This section is for making dotplot for selected GO terms
selected_GSE_results = GSE_results %>%
        dplyr::filter(ID %in% interested_GO)

selected_GSE_results_tidy = GSE_results_tidy %>%
        dplyr::filter(ID %in% interested_GO)
# newborn
df <- ggplot2::fortify(selected_GSE_results, showCategory = nrow(selected_GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx_newborn <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                         levels=rev(unique(df$Description[idx_newborn])))

newborn = ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
                               guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + xlim(1.5, 2.4) + facet_grid(.~.sign) + theme(legend.position = "none")

newborn_order = df$ID

EnrichmentMap

Code

GSE = enrichplot::pairwise_termsim(GSE_results) 

GSE %>% emapplot()

Delayed vs Control

Differential gene expression analysis using DESeq2
GO enrichment analysis

Code

contrast = c("Group", "Delayed", "Control")
res_unshrunken <- results(dds, contrast = contrast, alpha = 0.05)
#summary(res_unshrunken)
res <- lfcShrink(dds, contrast = contrast, res = res_unshrunken, type = "ashr")
#summary(res)

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Code

deseq2_plotMA(res)

Significant features

Note

Genes with a Benjamini-Hochberg (FDR) adjusted P value smaller than 0.05 and an absolute log2 fold change larger than 1 were considered differentially expressed.

Code

full_stat <- as.data.frame(res) %>%
        mutate(across(everything(), ~as.numeric(.x))) %>%
        mutate(abs_log2FC = abs(log2FoldChange) )

full_stat_CA = full_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames")

Code

sig_stat <- full_stat %>%
  #dplyr::rename(., gene = Gene_name) %>%
  dplyr::filter(padj < fdr & abs(log2FoldChange) > log2fc) %>%
  mutate(status = if_else(log2FoldChange < 0, "Down", "Up"))

sig_stat_CA = sig_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") %>%
        dplyr::filter(!is.na(external_gene_name) )

Volcano plot
Heatmap

Code

deseq2_volcano_2(full_stat, sig_stat_CA, fdr = fdr, log2fc = log2fc, label_col = "external_gene_name")

Code

hm = transformed_count[, rownames(anno)[which(anno$Group %in% c("Control", "Delayed"))]]
deseq2_hm_2(hm, sig_stat_CA, anno, save = FALSE)

Code

deseq2_hm_2(hm, sig_stat_CA, anno, rowlabel = "external_gene_name", top_n = heatmap_topN, save = FALSE)

Over-representation analysis (ORA)
Gene set enrichment analysis (GSEA)

Note

Over representation analysis needs two things as input:

A defined gene list
Gene sets (pathways) or annotations: e.g. GO database.

The question ORA is trying to answer is: are any of the gene sets (pathways) surprisingly enriched in the gene list?

Code

genes_to_test = rownames(sig_stat)
GO_results <- enrichGO(gene = genes_to_test, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff  = 0.01,qvalueCutoff = 0.05, readable=TRUE)

Code

#as.data.frame(GO_results) %>% head()
GO_results_tidy = GO_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num)) 
#GO_results_tidy %>% create_dt()

Dotplot

Code

GO_results = GO_results %>% 
        mutate(GeneRatio_num = DOSE::parse_ratio(GeneRatio)) %>%
        dplyr::arrange(desc(GeneRatio_num))
dotplot(GO_results, showCategory = 20)

EnrichmentMap

Code

ORA = enrichplot::pairwise_termsim(GO_results) 

ORA %>% enrichplot::emapplot()

Note

GSEA needs two things as input:

A ranked gene list
Gene Sets (pathways) or annotations: e.g. GO database.

The analysis answers the question: are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?

Code

geneList = res_unshrunken$log2FoldChange
names(geneList) = rownames(res_unshrunken)

geneList = na.omit(geneList) %>% sort(., decreasing = TRUE)

GSE_results = gseGO(gene = geneList, OrgDb = "org.Hs.eg.db", keyType = "ENSEMBL", ont = "ALL", pAdjustMethod = "BH", pvalueCutoff = 0.05)

Code

GSE_results_tidy = GSE_results %>% 
        slot("result") %>% 
        tibble::as_tibble() %>%
        dplyr::arrange(desc(NES))
        
#GSE_results_tidy %>% create_dt()

Dotplot

Top 20 enriched terms in both activated and suppressed categories

Code

df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + facet_grid(.~.sign, scales = "free_x")

Code

#dotplot(GSE_results, split=".sign") + facet_grid(.~.sign)

Code

df <- ggplot2::fortify(GSE_results, showCategory = nrow(GSE_results_tidy), split=".sign") %>%
        #group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(desc(NES))

idx <- order(df[["NES"]], decreasing = TRUE)
df$Description <- factor(df$Description,
                          levels=rev(unique(df$Description[idx])))

ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
            guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) #+ facet_grid(.~.sign)
#dotplot(GSE_results, split=".sign") + facet_grid(.~.sign)

Code

# This section is for making dotplot for selected GO terms
selected_GSE_results = GSE_results %>%
        dplyr::filter(ID %in% interested_GO)

selected_GSE_results_tidy = GSE_results_tidy %>%
        dplyr::filter(ID %in% interested_GO)
# delayed
df <- ggplot2::fortify(selected_GSE_results, showCategory = nrow(selected_GSE_results_tidy), split=".sign") %>%
        group_by(., .sign) %>%
        top_n(., 20, NES) %>%
        dplyr::arrange(match(ID, newborn_order))


df$Description <- factor(df$Description,
                         levels=rev(unique(df$Description[idx_newborn])))


delayed = ggplot(df, aes_string(x="NES", y="Description", size= "GeneRatio", color="p.adjust")) +
        geom_point() +
        scale_color_continuous(low="red", high="blue", name = "p.adjust",
                               guide=guide_colorbar(reverse=TRUE)) +
        scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
        ylab(NULL) + ggtitle("") + DOSE::theme_dose(12) +
        scale_size(range=c(3, 8)) + xlim(1.5, 2.4) + facet_grid(.~.sign)

EnrichmentMap

Code

GSE = enrichplot::pairwise_termsim(GSE_results) 

GSE %>% emapplot()

Augumentation vs Control

Differential gene expression analysis using DESeq2
GO enrichment analysis

Code

contrast = c("Group", "Augment", "Control")
res_unshrunken <- results(dds, contrast = contrast, alpha = 0.05)
#summary(res_unshrunken)
res <- lfcShrink(dds, contrast = contrast, res = res_unshrunken, type = "ashr")
#summary(res)

MA plot

MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.

Code

deseq2_plotMA(res)

Significant features

Code

full_stat <- as.data.frame(res) %>%
        mutate(across(everything(), ~as.numeric(.x))) %>%
        mutate(abs_log2FC = abs(log2FoldChange) )

full_stat_DA = full_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames")

Code

sig_stat <- full_stat %>%
  #dplyr::rename(., gene = Gene_name) %>%
  dplyr::filter(padj < fdr & abs(log2FoldChange) > log2fc) %>%
  mutate(status = if_else(log2FoldChange < 0, "Down", "Up"))

sig_stat_DA = sig_stat %>% 
        rownames_to_column(., var = "rownames") %>%
        left_join(., ensembl_annot1, by = c("rownames" = "ensembl_gene_id")) %>%
        column_to_rownames(., var = "rownames") %>%
        dplyr::filter(!is.na(external_gene_name) )

Volcano plot
Heatmap

Code

deseq2_volcano_2(full_stat, sig_stat_DA, fdr = fdr, log2fc = log2fc, label_col = "external_gene_name")

Code

hm = transformed_count[, rownames(anno)[which(anno$Group %in% c("Control", "Augment"))]]
deseq2_hm_2(hm, sig_stat_DA, anno, rowlabel = "external_gene_name", top_n = 3, save = FALSE)

Due to the small number of genes significantly differentiated between groups, no enrichment analysis can be performed.

Conclusion

In our expression analysis, we identified significant alterations in exstrophy SMC gene expression, with 309 DEG in exstrophy SMC undergoing newborn closure compared to controls, including 223 upregulated and 86 downregulated genes. We then examined gene expression profiles of exstrophy SMC from patients born with small bladder templates who underwent delayed closure. We identified 297 DEG in delayed closure SMC compared to controls, including 229 upregulated and 68 downregulated genes. Despite being born with smaller templates and undergoing later exstrophy closure, the gene expression profiles of delayed closure SMC were not significantly different from newborn closure SMC. In fact, the gene expression profiles of newborn and delayed closures shared 159 DEG.

Code

sig_list_BA = rownames(sig_stat_BA)
sig_list_CA = rownames(sig_stat_CA)
sig_list_DA = rownames(sig_stat_DA)
common = intersect(sig_list_BA, sig_list_CA)

common_anno = ensembl_annot1 %>%
        dplyr::filter(ensembl_gene_id %in% common) %>%
        #remove_rownames(.,) %>%
        column_to_rownames(., var = "ensembl_gene_id")

An overall heatmap using these 159 common genes:

Code

deseq2_hm_3(transformed_count, common_anno, anno, rowlabel = "external_gene_name", rowname_switch = TRUE, save = FALSE)

We next analyzed gene expression in SMC from older CBE patients, biopsied at the time of bladder augmentation (enterocystoplasty). Interestingly, we identified only 3 DEG among SMC expression profiles from older CBE patients undergoing augmentation, as they were not significantly different from controls. DEG shared among all exstrophy SMC included upregulation of the chemokine CXCL8, upregulation of the endoplasmic reticulum protein reticulon 1 (RTN1), and downregulation of C1q and TNF-related 3 (C1QTNF3) genes. Clustering of expression profiles revealed augment SMC were most like controls, whereas newborn SMC were similar to delayed SMC. Together, these data suggest a shift of SMC expression to a non-exstrophy phenotype after closure, despite persistence of bladder pathology with lack of capacity.

Venn diagram of common genes of three contrasts

Code

make_venn_3(list("Newborn" = sig_list_BA, "Delayed" = sig_list_CA, "Augment" = sig_list_DA))

GSEA dotplot for selected GO terms in “Newborn” and “Delayed” contrasts

To better understand the implications of altered gene expression in exstrophy SMC, we performed gene set enrichment analysis (GSEA) organizing DEG into gene ontology (GO) pathways representative of cellular processes. GSEA confirmed that relevant GO pathways were altered in exstrophy SMC from newborn and delayed closures. Importantly, GSEA revealed that exstrophy SMC predominantly over-expressed inflammatory pathways including genes involved in chemokine signaling and the acute phase response. The inflammatory gene profile was present in both newborn and delayed closure SMC.

Code

egg::ggarrange(newborn, delayed, labels = c("Newborn", "Delayed"), nrow = 1,
               label.args = list(gp = grid::gpar(fontface = 2)) )

Reproducibility

The amount of time took to generate the report:

Code

Sys.time() - start_time

Time difference of 5.730587 mins

R session information:

Code

sessionInfo()

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] plotly_4.10.1               VennDiagram_1.7.3          
 [3] futile.logger_1.4.3         wesanderson_0.3.6          
 [5] org.Hs.eg.db_3.15.0         AnnotationDbi_1.58.0       
 [7] clusterProfiler_4.4.4       lubridate_1.9.2            
 [9] forcats_1.0.0               stringr_1.5.0              
[11] dplyr_1.1.2                 purrr_1.0.1                
[13] readr_2.1.4                 tidyr_1.3.0                
[15] tibble_3.2.1                tidyverse_2.0.0            
[17] gridExtra_2.3               knitcitations_1.0.12       
[19] knitr_1.43.1                readxl_1.4.2               
[21] DESeq2_1.36.0               SummarizedExperiment_1.26.1
[23] Biobase_2.56.0              MatrixGenerics_1.8.1       
[25] matrixStats_0.63.0          GenomicRanges_1.48.0       
[27] GenomeInfoDb_1.32.4         IRanges_2.30.1             
[29] S4Vectors_0.34.0            BiocGenerics_0.42.0        
[31] RColorBrewer_1.1-3          extrafont_0.19             
[33] DT_0.28                     data.table_1.14.8          
[35] pheatmap_1.0.12             ggrepel_0.9.3              
[37] ggplot2_3.4.2              

loaded via a namespace (and not attached):
  [1] snow_0.4-4             shadowtext_0.1.2       backports_1.4.1       
  [4] fastmatch_1.1-3        systemfonts_1.0.4      plyr_1.8.8            
  [7] igraph_1.5.1           lazyeval_0.2.2         splines_4.2.1         
 [10] BiocParallel_1.30.4    digest_0.6.31          invgamma_1.1          
 [13] yulab.utils_0.0.6      htmltools_0.5.5        GOSemSim_2.22.0       
 [16] viridis_0.6.3          GO.db_3.15.0           SQUAREM_2021.1        
 [19] fansi_1.0.4            magrittr_2.0.3         memoise_2.0.1         
 [22] tzdb_0.4.0             Biostrings_2.64.1      annotate_1.74.0       
 [25] graphlayouts_1.0.0     svglite_2.1.1          extrafontdb_1.0       
 [28] timechange_0.2.0       enrichplot_1.16.2      colorspace_2.1-0      
 [31] rvest_1.0.3            blob_1.2.4             xfun_0.39             
 [34] crayon_1.5.2           RCurl_1.98-1.12        jsonlite_1.8.4        
 [37] scatterpie_0.2.0       genefilter_1.78.0      survival_3.5-5        
 [40] ape_5.7-1              glue_1.6.2             kableExtra_1.3.4      
 [43] polyclip_1.10-4        gtable_0.3.3           zlibbioc_1.42.0       
 [46] XVector_0.36.0         webshot_0.5.4          DelayedArray_0.22.0   
 [49] Rttf2pt1_1.3.12        scales_1.2.1           DOSE_3.22.1           
 [52] futile.options_1.0.1   DBI_1.1.3              bibtex_0.5.1          
 [55] Rcpp_1.0.10            viridisLite_0.4.2      xtable_1.8-4          
 [58] gridGraphics_0.5-1     tidytree_0.4.2         bit_4.0.5             
 [61] truncnorm_1.0-9        htmlwidgets_1.6.2      httr_1.4.6            
 [64] fgsea_1.22.0           pkgconfig_2.0.3        XML_3.99-0.14         
 [67] farver_2.1.1           janitor_2.2.0          locfit_1.5-9.7        
 [70] utf8_1.2.3             labeling_0.4.2         ggplotify_0.1.0       
 [73] tidyselect_1.2.0       rlang_1.1.1            reshape2_1.4.4        
 [76] munsell_0.5.0          cellranger_1.1.0       tools_4.2.1           
 [79] cachem_1.0.8           downloader_0.4         cli_3.6.1             
 [82] generics_0.1.3         RSQLite_2.3.1          evaluate_0.21         
 [85] fastmap_1.1.1          yaml_2.3.7             RefManageR_1.4.0      
 [88] ggtree_3.4.4           bit64_4.0.5            tidygraph_1.2.3       
 [91] KEGGREST_1.36.3        ggraph_2.1.0           nlme_3.1-162          
 [94] formatR_1.14           aplot_0.1.10           DO.db_2.9             
 [97] xml2_1.3.4             compiler_4.2.1         rstudioapi_0.14       
[100] png_0.1-8              treeio_1.20.2          tweenr_2.0.2          
[103] geneplotter_1.74.0     stringi_1.7.12         highr_0.10            
[106] lattice_0.21-8         Matrix_1.5-4           vctrs_0.6.2           
[109] pillar_1.9.0           lifecycle_1.0.3        irlba_2.3.5.1         
[112] bitops_1.0-7           patchwork_1.1.2        qvalue_2.28.0         
[115] R6_2.5.1               codetools_0.2-19       lambda.r_1.2.4        
[118] MASS_7.3-60            withr_2.5.0            GenomeInfoDbData_1.2.8
[121] parallel_4.2.1         hms_1.1.3              ggfun_0.0.9           
[124] snakecase_0.11.0       rmarkdown_2.21         ashr_2.2-54           
[127] mixsqp_0.3-48          ggforce_0.4.1