這部分開始進(jìn)行基本的富集分析,,兩類
############################################################ A:差異基因富集分析(不需要表達(dá)值,只需要gene name)############################################################ -----------先說富集什么-----------
-----------什么是富集(原理)-----------富集的統(tǒng)計(jì)學(xué)基礎(chǔ)是超幾何分布,,簡單來說根據(jù)下面的Fisher精確檢驗(yàn)(Fisher exact test)公式,對每個(gè)GO或KEGG term計(jì)算一個(gè)p值 p=(M/K)[(N-M)/(n-k)]/(N/n),,其中 N:所有g(shù)ene總數(shù) n:N中差異表達(dá)gene的總數(shù) M:N中屬于某個(gè)GO term的gene個(gè)數(shù) k: n中屬于某個(gè)GO term的gene個(gè)數(shù) p:表示差異表達(dá)gene富集到這個(gè)GO term上的可信程度
---------------拿什么來富集---------------得到的差異表達(dá)基因列表就可以,,也就是說不需要其他的值 ---------------用什么工具富集--------------只能說實(shí)在是太多太多了,。。,。,。但是用的時(shí)候要小心,因?yàn)槟愣嘤脦讉€(gè)工具,即使設(shè)定同樣的p值也會發(fā)現(xiàn)結(jié)果有出入,,有時(shí)還差異挺大,。 1 按使用方式來說(簡單度)有3種
B: 基因集(gene set)富集分析(不管有無差異,,需要全部genes表達(dá)值)##########################################################
-------------------具體操作---------------------#enrichment analysis using clusterprofiler package created by yuguangchuang library(clusterProfiler) library(DOSE) library(org.Mm.eg.db) #get the ENTREZID for the next analysis setwd("F:/rna_seq/data/matrix") sig.gene<-read.csv(file="DEG_treat_vs_control.csv") head(sig.gene) gene<-sig.gene[,1] head(gene) gene.df<-bitr(gene, fromType = "ENSEMBL", toType = c("SYMBOL","ENTREZID"), OrgDb = org.Mm.eg.db) head(gene.df) GO富集分析: #Go classification #Go enrichment ego_cc<-enrichGO(gene = gene.df$ENSEMBL, OrgDb = org.Mm.eg.db, keyType = 'ENSEMBL', ont = "CC", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05) ego_bp<-enrichGO(gene = gene.df$ENSEMBL, OrgDb = org.Mm.eg.db, keyType = 'ENSEMBL', ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05) barplot(ego_bp,showCategory = 18,title="The GO_BP enrichment analysis of all DEGs ")+ scale_size(range=c(2, 12))+ scale_x_discrete(labels=function(ego_bp) str_wrap(ego_bp,width = 25)) gobp.jpeg #KEGG enrichment install.packages("stringr") library(stringr) kk<-enrichKEGG(gene =gene.df$ENTREZID, organism = 'mmu', pvalueCutoff = 0.05) kk[1:30] barplot(kk,showCategory = 25, title="The KEGG enrichment analysis of all DEGs")+ scale_size(range=c(2, 12))+ scale_x_discrete(labels=function(kk) str_wrap(kk,width = 25)) dotplot(kk,showCategory = 25, title="The KEGG enrichment analysis of all DEGs")+ scale_size(range=c(2, 12))+ scale_x_discrete(labels=function(kk) str_wrap(kk,width = 25)) kegg.jpeg keggdot.jpeg # Gene Set Enrichment Analysis(GSEA) # 獲取按照log2FC大小來排序的基因列表 genelist <- sig.gene$log2FoldChange names(genelist) <- sig.gene[,1] genelist <- sort(genelist, decreasing = TRUE) # GSEA分析 gsemf <- gseGO(genelist, OrgDb = org.Mm.eg.db, keyType = "ENSEMBL", ont="BP" ) # 查看信息 head(gsemf) # 畫出GSEA圖 gseaplot(gsemf, geneSetID="GO:0001819") gsea.jpeg 后記:做完這部分富集分析,,接著按我的流程進(jìn)入下一部分分析RNA-seq(10):KEGG通路可視化,因?yàn)橹苯佑玫竭@部分?jǐn)?shù)據(jù),, |
|