Objectives To test the hypothesis that phenotypes in bladder exstrophy result from alterations in detrusor smooth muscle cell (SMC) gene expression. Methods We generated primary human bladder smooth muscle cell lines from patients with classic bladder exstrophy (CBE) undergoing newborn closure (n=6), delayed primary closure (n=5), augmentation cystoplasty (n=6), and non-CBE controls (n=3). Gene expression profiles were then created using RNA sequencing and characterized using gene set enrichment analysis (GSEA). Results We identified 308 differentially expressed genes in bladder exstrophy SMC when compared to controls, including 223 upregulated and 85 downregulated genes. Bladder exstrophy muscle cell lines from newborn closure and primary delayed closures shared expression changes in 159 genes. GSEA analysis revealed increased expression in the inflammatory response and alteration of genes for genitourinary development in newborn and delayed closure SMC. However, these changes were absent in SMC from older exstrophy patients after closure. Conclusions Bladder exstrophy SMC demonstrate gene expression changes in the inflammatory response and genitourinary development. However, gene expression profiles normalized in exstrophy SMC from older patients after closure, suggesting a normalization of exstrophy SMC over time. Our in vitro findings regarding the normalization of exstrophy SMC gene expression following bladder closure suggest that the development of poor detrusor compliance in bladder exstrophy has a complex multifactorial etiology. Taken together, our findings suggest that alterations in SMC gene expression may explain abnormalities in the exstrophy bladder seen prior to and immediately after closure and suggest that surgical closure may allow exstrophy SMC to normalize over time.
Code
suppressPackageStartupMessages(c(library("ggrepel"),library("pheatmap"),library("data.table"),library("DT"),library("extrafont"),library("RColorBrewer"),library("DESeq2"),library("readxl"),library("knitr"),library("knitcitations"),library("gridExtra"),library("DESeq2"),library("tidyverse"),library("clusterProfiler"),library("org.Hs.eg.db"),library("AnnotationDbi")))log2fc =1fdr =0.05heatmap_topN =100# get start timestart_time <-Sys.time()
Project and data background
The bladder exstrophy- epispadias- cloacal exstrophy complex (BEEC) is a rare spectrum of congenital anomalies affecting the genitourinary, musculoskeletal, and gastrointestinal systems with varying severity. Exstrophy bladders are known to have reduced smooth muscle content and increased type-III collagen deposition, while exstrophy smooth muscle cells (SMC) have reduced intracellular calcium levels and reduced contractile responses. We hypothesized that exstrophy bladder and SMC phenotypes result from alterations in detrusor SMC gene expression.
To examine potential alterations in exstrophy SMC gene expression, we generated primary SMC lines from bladder biopsies of patients with CBE at the time of surgery. We then measured global gene expression with RNA-seq of RNAs isolated from primary SMCs. Our cell lines included CBE patients undergoing newborn closure (n=6), delayed primary closure (n=5), older CBE patients undergoing bladder augmentation (enterocystoplasty) for poor bladder growth after newborn closure (n=6), and non-CBE controls undergoing ureteral reimplantation for vesicoureteral reflux (n=3).
Code
### data formatting and wranglingcount_dm =read.delim(raw_count_file, row.names=1)for (i in1:nrow(count_dm)) { count_dm$ensembl_id[i] =unlist(str_split( rownames(count_dm)[i], pattern ="\\." ))[1]}# duplication rownamesdup =subset(count_dm, duplicated(count_dm$ensembl_id)) %>%pull(., var ="ensembl_id")#rownames(dup_df[dup_df$ensembl_id == dup[3],])# select one row from duplicated rows to keepdup_df = count_dm %>% dplyr::filter(ensembl_id %in% dup) %>%rownames_to_column(., var ="rownames") %>%group_by(., ensembl_id) %>%filter(!grepl("Y$", rownames)) %>%column_to_rownames(., var ="rownames")count_dm = count_dm %>% dplyr::filter(!(ensembl_id %in% dup)) %>%bind_rows(., dup_df)count_dm = count_dm %>%remove_rownames(.) %>%column_to_rownames(., var ="ensembl_id")
### run deseq2design_formula =~GroupddsMat <-DESeqDataSetFromMatrix(countData = count_dm,colData = sample_data,design = design_formula)# pre-filtering the data set by filtering rows that have no or 1 countddsMat <- ddsMat[ rowSums(counts(ddsMat)) >0, ]dds =DESeq(ddsMat)#resultsNames(dds)
RNA-seq data processing and analysis
RNA was isolated from primary SMC, passage P3-5, after serum starving in 0.5% serum for 24hrs. mRNA-seq libraries were prepared from total RNA for sequencing on an Illumina NextSeq500 platform. Major steps of the RNA-seq data processing pipelines include:
FASTQC and MULTIQC for quality check
TrimGalore for trimming and pre-processing
STAR for mapping and align sequences to reference genome
SUBREAD for summarizing reads before and after pre-processing and alignment
### generate transformed count matrix# blind = TRUE for QA, blind = FALSE for downstream analysistransformed_count <-rlog(ddsMat, blind =FALSE)colnames(transformed_count) <- sample_data$sample# change transformed_count into a matrixtransformed_count <-assay(transformed_count)# annotation for heatmapsanno = sample_data %>% dplyr::select(Group, sample) %>%column_to_rownames(var ="sample")deseq2_dist(transformed_count, anno)
This plot shows the euclidean distance of samples and how they are hierarchically clustered using the normalized (regularized log transformed) count data generated with DESeq2.
The PCA plot shows the first two principal components of the regularized log transformed count data for all samples (there is two few data points to calculate ellipse for Control group). There was no obvious separation between groups.
Pairwise comparisons between newborn closure, delayed closure, and augmentation groups were compared to the control group, respectively. Differentially expressed genes (DEG) were defined as having a Benjamini-Hochberg adjusted P value <0.05 and log2FC >1.0. DEG were further enrolled in GO enrichment analysis in R using clusterProfiler version 4.4.4. For gene set enrichment analysis (GSEA), the ranked gene list by log2FC was entered in a pre-ranked GSEA function in clusterProfiler, using human gene sets database (org.Hs.eg.db v3.15.0). Dot plots from GO enrichment analysis and GSEA were generated in R by enrichplot version 1.16.2. Only annotated RNAs are presented. All experimental data are available through the NCBI GEO repository.
contrast =c("Group", "Newborn", "Control")res_unshrunken <-results(dds, contrast = contrast, alpha =0.05)#summary(res_unshrunken)res <-lfcShrink(dds, contrast = contrast, res = res_unshrunken, type ="ashr")#summary(res)
MA plot
MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.
Code
deseq2_plotMA(res)
Significant features
Note
Genes with a Benjamini-Hochberg (FDR) adjusted P value smaller than 0.05 and an absolute log2 fold change larger than 1 were considered differentially expressed.
Code
full_stat <-as.data.frame(res) %>%mutate(across(everything(), ~as.numeric(.x))) %>%mutate(abs_log2FC =abs(log2FoldChange) )full_stat_BA = full_stat %>%rownames_to_column(., var ="rownames") %>%left_join(., ensembl_annot1, by =c("rownames"="ensembl_gene_id")) %>%column_to_rownames(., var ="rownames")
Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.
Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.
Code
hm = transformed_count[, rownames(anno)[which(anno$Group %in%c("Control", "Newborn"))]]deseq2_hm_2(hm, sig_stat_BA, anno, save =FALSE)
Code
deseq2_hm_2(hm, sig_stat_BA, anno, rowlabel ="external_gene_name", top_n = heatmap_topN, save =FALSE)
Pathway enrichment analysis is a way to summarize your gene list into pathways to ease biological interpretation of the data. The analysis uses the overlap between your gene list and a pathway to calculate pathway enrichment score.
There are two types of gene lists, corresponding to two tools for enrichment analysis:
Defined gene list (e.g., expression change > 2-fold, significantly differentiated genes only): over-representation analysis
Answers the question: Are any pathways (gene sets) surprisingly enriched in my gene list?
Statistical test: Fisher’s exact test
Ranked gene list (e.g., by differential gene expression): GSEA
Answers the question: Are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?
Statistical test: GSEA
Note
Ranked gene list is preferred and recommended so as to avoid arbitrary threshold.
There are many pathway databases for enrichment analysis, such as GO, KEGG, WikiPathways, Reactome, etc. In this section, we are using GO (gene ontology) database.
Dotplot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (e.g. p values) and gene count or ratio as dot size and color.
Above graph is showing top 20 enriched terms.
EnrichmentMap
Code
ORA = enrichplot::pairwise_termsim(GO_results) ORA %>% enrichplot::emapplot()
Note
Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module.
Default is showing 30 enriched terms.
Note
GSEA needs two things as input:
A ranked gene list
Gene Sets (pathways) or annotations: e.g. GO database.
The analysis answers the question: are any pathways (gene sets) ranked surprisingly high or low in my ranked list of genes?
contrast =c("Group", "Delayed", "Control")res_unshrunken <-results(dds, contrast = contrast, alpha =0.05)#summary(res_unshrunken)res <-lfcShrink(dds, contrast = contrast, res = res_unshrunken, type ="ashr")#summary(res)
MA plot
MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.
Code
deseq2_plotMA(res)
Significant features
Note
Genes with a Benjamini-Hochberg (FDR) adjusted P value smaller than 0.05 and an absolute log2 fold change larger than 1 were considered differentially expressed.
Code
full_stat <-as.data.frame(res) %>%mutate(across(everything(), ~as.numeric(.x))) %>%mutate(abs_log2FC =abs(log2FoldChange) )full_stat_CA = full_stat %>%rownames_to_column(., var ="rownames") %>%left_join(., ensembl_annot1, by =c("rownames"="ensembl_gene_id")) %>%column_to_rownames(., var ="rownames")
Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.
Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.
Code
hm = transformed_count[, rownames(anno)[which(anno$Group %in%c("Control", "Delayed"))]]deseq2_hm_2(hm, sig_stat_CA, anno, save =FALSE)
Code
deseq2_hm_2(hm, sig_stat_CA, anno, rowlabel ="external_gene_name", top_n = heatmap_topN, save =FALSE)
contrast =c("Group", "Augment", "Control")res_unshrunken <-results(dds, contrast = contrast, alpha =0.05)#summary(res_unshrunken)res <-lfcShrink(dds, contrast = contrast, res = res_unshrunken, type ="ashr")#summary(res)
MA plot
MA plot compares the log fold change against the mean of the normalized counts. Each point shows a feature. The points in blue are those that have an adjusted p-value smaller than alpha.
Code
deseq2_plotMA(res)
Significant features
Code
full_stat <-as.data.frame(res) %>%mutate(across(everything(), ~as.numeric(.x))) %>%mutate(abs_log2FC =abs(log2FoldChange) )full_stat_DA = full_stat %>%rownames_to_column(., var ="rownames") %>%left_join(., ensembl_annot1, by =c("rownames"="ensembl_gene_id")) %>%column_to_rownames(., var ="rownames")
Volcano plot is another way to visualize the DGE results. Each point represents a feature. The colored points are differentially expressed genes with alpha = 0.05 and log2FC = 1 (blue : down-regulated, red : up-regulated). Top 10 up- and down-regulated genes are labeled.
Heatmaps are a great way to visualize the clustering of samples using normalized count data. This heatmap shows the row-wise Z-scores of regularized log transformed count data for all differentially expressed genes.
Code
hm = transformed_count[, rownames(anno)[which(anno$Group %in%c("Control", "Augment"))]]deseq2_hm_2(hm, sig_stat_DA, anno, rowlabel ="external_gene_name", top_n =3, save =FALSE)
Due to the small number of genes significantly differentiated between groups, no enrichment analysis can be performed.
Conclusion
In our expression analysis, we identified significant alterations in exstrophy SMC gene expression, with 309 DEG in exstrophy SMC undergoing newborn closure compared to controls, including 223 upregulated and 86 downregulated genes. We then examined gene expression profiles of exstrophy SMC from patients born with small bladder templates who underwent delayed closure. We identified 297 DEG in delayed closure SMC compared to controls, including 229 upregulated and 68 downregulated genes. Despite being born with smaller templates and undergoing later exstrophy closure, the gene expression profiles of delayed closure SMC were not significantly different from newborn closure SMC. In fact, the gene expression profiles of newborn and delayed closures shared 159 DEG.
deseq2_hm_3(transformed_count, common_anno, anno, rowlabel ="external_gene_name", rowname_switch =TRUE, save =FALSE)
We next analyzed gene expression in SMC from older CBE patients, biopsied at the time of bladder augmentation (enterocystoplasty). Interestingly, we identified only 3 DEG among SMC expression profiles from older CBE patients undergoing augmentation, as they were not significantly different from controls. DEG shared among all exstrophy SMC included upregulation of the chemokine CXCL8, upregulation of the endoplasmic reticulum protein reticulon 1 (RTN1), and downregulation of C1q and TNF-related 3 (C1QTNF3) genes. Clustering of expression profiles revealed augment SMC were most like controls, whereas newborn SMC were similar to delayed SMC. Together, these data suggest a shift of SMC expression to a non-exstrophy phenotype after closure, despite persistence of bladder pathology with lack of capacity.
GSEA dotplot for selected GO terms in “Newborn” and “Delayed” contrasts
To better understand the implications of altered gene expression in exstrophy SMC, we performed gene set enrichment analysis (GSEA) organizing DEG into gene ontology (GO) pathways representative of cellular processes. GSEA confirmed that relevant GO pathways were altered in exstrophy SMC from newborn and delayed closures. Importantly, GSEA revealed that exstrophy SMC predominantly over-expressed inflammatory pathways including genes involved in chemokine signaling and the acute phase response. The inflammatory gene profile was present in both newborn and delayed closure SMC.