火山圖大家應(yīng)該是也基本上都沒有問題,,下面的MA圖其實(shí)跟火山圖非常的類似,,兩者都是log2FC信息,不同的是火山圖展現(xiàn)P值,,而MA圖展現(xiàn)的是表達(dá)量情況,!
- 火山圖是為了說明log2FC比較大的一般來說具有統(tǒng)計(jì)學(xué)顯著性
- 而MA圖是為了說明log2FC無論大小,都不應(yīng)該與表達(dá)量有相關(guān)性,。
我們通常呢,,挑選差異基因,會(huì)選擇那些log2FC比較大而且具有統(tǒng)計(jì)學(xué)顯著性的上下調(diào)基因,,不過加上MA圖,,就可以進(jìn)一步挑選那些表達(dá)量也比較高的,因?yàn)檫@樣的基因呢,,容易去實(shí)驗(yàn)驗(yàn)證,。而且呢,,通常情況下常識(shí)會(huì)告訴我們高表達(dá)量基因更容易發(fā)揮作用。 MA圖加上GO數(shù)據(jù)庫富集圖例: - (C) Differential gene expression between primary polyp calicoblasts and adult colony calicoblasts. Significant genes are shown in purple (adjusted p value <1e5, Fisher exact test).
- (D) Gene Ontology analysis for differentially expressed genes between polyp and adult calicoblasts.
當(dāng)然了,,也可以對kegg進(jìn)行同樣的分析,,上下調(diào)基因獨(dú)立進(jìn)行富集注釋: 文獻(xiàn)超級漂亮的kegg圖 來源文獻(xiàn):Liu et al. J Hematol Oncol (2021) 14:6 https:///10.1186/s13045-020-01021-x ,不過要畫出上面的圖還是有一點(diǎn)難度的,, 如果你一直看的是我的代碼,,你會(huì)發(fā)現(xiàn)出圖簡直是不忍直視: 我的kegg圖我的代碼是: ## KEGG pathway analysis ### 做KEGG數(shù)據(jù)集超幾何分布檢驗(yàn)分析,重點(diǎn)在結(jié)果的可視化及生物學(xué)意義的理解,。 if(T){ ### over-representation test kk.up <- enrichKEGG(gene = gene_up, organism = 'hsa', universe = gene_all, pvalueCutoff = 0.9,#根據(jù)實(shí)際情況調(diào)整 qvalueCutoff =0.9)#根據(jù)實(shí)際情況調(diào)整 head(kk.up)[,1:6] dotplot(kk.up );ggsave('kk.up.dotplot.png') kk.down <- enrichKEGG(gene = gene_down, organism = 'hsa', universe = gene_all, pvalueCutoff = 0.9,#根據(jù)實(shí)際情況調(diào)整 qvalueCutoff =0.9)#根據(jù)實(shí)際情況調(diào)整 head(kk.down)[,1:6] dotplot(kk.down );ggsave('kk.down.dotplot.png') kk.diff <- enrichKEGG(gene = gene_diff, organism = 'hsa', pvalueCutoff = 0.05)#根據(jù)實(shí)際情況調(diào)整 head(kk.diff)[,1:6] dotplot(kk.diff );ggsave('kk.diff.dotplot.png') kegg_diff_dt <- as.data.frame(kk.diff) kegg_down_dt <- as.data.frame(kk.down) kegg_up_dt <- as.data.frame(kk.up) down_kegg<-kegg_down_dt[kegg_down_dt$pvalue<0.05,];down_kegg$group=-1 up_kegg<-kegg_up_dt[kegg_up_dt$pvalue<0.05,];up_kegg$group=1 ## 原來的畫圖函數(shù) source('functions.R') g_kegg=kegg_plot(up_kegg,down_kegg) print(g_kegg) ggsave(g_kegg,filename = 'kegg_up_down.png') ### GSEA kk_gse <- gseKEGG(geneList = geneList, organism = 'hsa', nPerm = 1000,#根據(jù)實(shí)際情況調(diào)整 minGSSize = 10,#根據(jù)實(shí)際情況調(diào)整 pvalueCutoff = 0.9,##根據(jù)實(shí)際情況調(diào)整 verbose = FALSE) head(kk_gse)[,1:6] gseaplot(kk_gse, geneSetID = rownames(kk_gse[1,])) down_kegg<-kk_gse[kk_gse$pvalue<0.05 & kk_gse$enrichmentScore < 0,];down_kegg$group=-1 up_kegg<-kk_gse[kk_gse$pvalue<0.05 & kk_gse$enrichmentScore > 0,];up_kegg$group=1 g_kegg=kegg_plot(up_kegg,down_kegg) print(g_kegg) ggsave(g_kegg,filename = 'kegg_up_down_gsea.png') }
然后很久以前的一個(gè)學(xué)徒: ### Update: Xiaojie Liang ### Date: 2021-07-12 23:38:07 ### Email: [email protected] ### --------------- library(ggthemes) library(ggplot2) go.kegg_plot <- function(up.data,down.data){ # up.data <- up.data # down.data <- down.data dat=rbind(up.data,down.data) colnames(dat) dat$p.adjust = -log10(dat$p.adjust) dat$p.adjust=dat$p.adjust*dat$group dat=dat[order(dat$p.adjust,decreasing = F),] gk_plot <- ggplot(dat,aes(reorder(Description, p.adjust), y=p.adjust)) + geom_bar(aes(fill=factor((p.adjust>0)+1)),stat="identity", width=0.7, position=position_dodge(0.7)) + coord_flip() + scale_fill_manual(values=c("#0072B2", "#B20072"), guide=FALSE) + labs(x="", y="" ) + theme_pander() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #axis.ticks.x = element_blank(), axis.line.x = element_line(size = 0.3, colour = "black"),#x軸連線 axis.ticks.length.x = unit(-0.20, "cm"),#修改x軸刻度的高度,,負(fù)號(hào)表示向上 axis.text.x = element_text(margin = margin(t = 0.3, unit = "cm")),##線與數(shù)字不要重疊 axis.ticks.x = element_line(colour = "black",size = 0.3) ,#修改x軸刻度的線 axis.ticks.y = element_blank(), axis.text.y = element_text(hjust=0), panel.background = element_rect(fill=NULL, colour = 'white') ) }
效果如下: 學(xué)徒的kegg圖啊,扯遠(yuǎn)了,,我們這個(gè)教程其實(shí)是想征集大家的代碼,,關(guān)于上下調(diào)基因各自獨(dú)立進(jìn)行GO數(shù)據(jù)庫的3分類富集后的可視化: 首先進(jìn)行上下調(diào)基因的ID轉(zhuǎn)換前面的表達(dá)量矩陣差異分析代碼我這里就不再贅述啦,我們分析已經(jīng)拿到了 gene_up 和 gene_down 這兩個(gè)變量(簡單的向量而已)代表的上下調(diào)差異基因哦,! library(org.Hs.eg.db) #把SYMBOL轉(zhuǎn)換ENTREZID,,這里會(huì)損失一部分無法匹配到的 gene_up=as.character(na.omit(AnnotationDbi::select(org.Hs.eg.db, keys = gene_up, columns = 'ENTREZID', keytype = 'SYMBOL')[,2])) gene_down=as.character(na.omit(AnnotationDbi::select(org.Hs.eg.db, keys = gene_down, columns = 'ENTREZID', keytype = 'SYMBOL')[,2]))
通常使用clusterProfiler包做GO和KEGG數(shù)據(jù)庫富集需要的是 ENTREZID的ID系統(tǒng),所以轉(zhuǎn)換一下,。 然后上下調(diào)基因獨(dú)立進(jìn)行GO數(shù)據(jù)庫富集代碼仍然是很簡單: library(clusterProfiler) #對正相關(guān)基因進(jìn)行富集分析 go <- enrichGO(gene_up, OrgDb = "org.Hs.eg.db", ont="all") library(ggplot2) library(stringr) barplot(go, split="ONTOLOGY")+ facet_grid(ONTOLOGY~., scale="free")+ ggsave('gene_up_GO_all_barplot.png') go=DOSE::setReadable(go, OrgDb='org.Hs.eg.db',keyType='ENTREZID') write.csv( go@result,file = 'gene_up_GO_all_barplot.csv')
#對負(fù)相關(guān)基因進(jìn)行富集分析 go <- enrichGO(gene_down, OrgDb = "org.Hs.eg.db", ont="all") barplot(go, split="ONTOLOGY")+ facet_grid(ONTOLOGY~., scale="free")+ ggsave('gene_down_GO_all_barplot.png') go=DOSE::setReadable(go, OrgDb='org.Hs.eg.db',keyType='ENTREZID') write.csv( go@result,file = 'gene_down_GO_all_barplot.csv')
這樣有了最原始的圖,,如何繪制成為文章開頭的美圖呢? 學(xué)徒作業(yè)看我六年前的表達(dá)芯片的公共數(shù)據(jù)庫挖掘系列推文 : 完成數(shù)據(jù)集:https://www.ncbi.nlm./geo/query/acc.cgi?acc=GSE97251 的表達(dá)量矩陣差異分析 : GSM2560200 plasmaRNA_health H13 GSM2560201 plasmaRNA_health H5 GSM2560202 plasmaRNA_health H8 GSM2560203 plasmaRNA_OSCC X474 GSM2560204 plasmaRNA_OSCC X574 GSM2560205 plasmaRNA_OSCC X602
這個(gè)數(shù)據(jù)集是很簡單的兩個(gè)分組,,芯片平臺(tái)是 Agilent-079487 Arraystar Human LncRNA microarray V4 如果確實(shí)感興趣也可以去和原文對比:Screening and validation of plasma long non-coding RNAs as biomarkers for the early diagnosis and staging of oral squamous cell carcinoma. Oncol Lett 2021 Feb;21(2):172. PMID: 33552289 我需要大家完成文章開頭的美圖,并且給出來代碼,!
|