?詳情請聯(lián)系作者: ? 往期scATAC內(nèi)容:完整版已提前發(fā)布在微信VIP群,! 1,、ArchR包單細(xì)胞ATAC分析(1): 上游分析
2、ArchR包單細(xì)胞ATAC分析(2): 數(shù)據(jù)讀入,、構(gòu)建ArchRProject、數(shù)據(jù)質(zhì)控
3,、ArchR包單細(xì)胞ATAC分析(3): 數(shù)據(jù)降維,、去批次,、聚類 4、ArchR包單細(xì)胞ATAC分析(4): 基因評分和marker基因鑒定 除了前面一節(jié)說的利用基因評分定義細(xì)胞類型之外,,我們還可以借助scRNA的細(xì)胞注釋進(jìn)行ATAC的細(xì)胞注釋,。事實上,我們在測scATAC的同時往往會對scRNA進(jìn)行測序,,scRNA做好細(xì)胞注釋后,,就可以將scATAC-seq的細(xì)胞比對到scRNA-seq
的細(xì)胞,實現(xiàn)兩種數(shù)據(jù)的整合,。整合的原理其實就是尋找scATAC與scRNA的相似細(xì)胞,,然后將scRNA-seq中對應(yīng)的細(xì)胞表
達(dá)量賦值給scATAC-seq細(xì)胞。最后,,scATAC-seq中每個細(xì)胞都會有基因表達(dá)量特征,,能被用于下游分析。 ArchR提到有兩種整合方式,,一種是“無約束整合”,兩一種是“有約束整合”,。無約束整合其實簡單的理解就是直接
addGeneIntegrationMatrix()函數(shù)對scATAC-seq和scRNA-seq數(shù)據(jù)進(jìn)行整合,。這種方式對scATAC-seq中的celltype不做
假設(shè),直接嘗試將這些細(xì)胞和scRNA-seq實驗里的任意細(xì)胞進(jìn)行配對,。有約束整合顧名思義就是做一些限定,,根據(jù)先驗知
識進(jìn)行整合。 我們示例數(shù)據(jù)也是有相應(yīng)的scRNA,,并已經(jīng)注釋好了,,讀入scRNA數(shù)據(jù)。 scRNA <- readRDS("~/data_analysis/scATAC/scATAC/ATAC_analysis/scRNA.rds") unique(scRNA$celltype)
1,、無約束整合,,直接整合注釋 set.seed(10) proj.filter <- addGeneIntegrationMatrix(ArchRProj = proj.filter, #ATAC proj對象 useMatrix = "GeneScoreMatrix", matrixName = "GeneIntegrationMatrix", reducedDims = "Harmony", seRNA = scRNA, #單細(xì)胞scRNA對象 addToArrow = TRUE, groupRNA = "celltype",#scRNA seurat細(xì)胞類型列名 nameCell = "predictedCell", nameGroup = "predictedGroup", nameScore = "predictedScore", force=TRUE) #plot plotEmbedding(proj.filter, colorBy = "cellColData", name = "predictedGroup",size = 0.05)
我們創(chuàng)建一個confusionMatrix,看看Cluster與celltype得對應(yīng)。 #我們創(chuàng)建一個confusionMatrix,看看Cluster與celltype得對應(yīng) cM <- confusionMatrix(proj.filter$Clusters, proj.filter$predictedGroup) cM <- cM / Matrix::rowSums(cM) labelNew <- colnames(cM)[apply(cM, 1, which.max)] mapLabs <- cbind(rownames(cM), labelNew)
pheatmap::pheatmap(mat = as.matrix(cM), color = paletteContinuous("whiteBlue"), border_color = "black")
將注釋標(biāo)記到scATAC,。 #label scATC proj.filter$Clusters_sub <- mapLabs[match(proj.filter$Clusters, mapLabs[,1]),2] table(proj.filter$Clusters_sub, proj.filter$Sample)#最終整合9種celltype p1 <- plotEmbedding(proj.filter, colorBy = "cellColData", name = "Clusters_sub",embedding = "UMAP") plotPDF(p1, name = "Plot-UMAP-RNA-Clusters_sub.pdf", ArchRProj = proj.filter, addDOC = FALSE, width = 5, height = 5)
我們利用整合矩陣重新做一下marker基因的表達(dá)圖,,發(fā)現(xiàn)marker基因的可視化更加特異。 getAvailableMatrices(proj.filter) # [1] "GeneIntegrationMatrix" "GeneScoreMatrix" "TileMatrix"
sc_marker <- c("CUX2", "RORB", "THEMIS", "TLE4", "VIP", "ID2", "SST","PVALB", "AQP4", "OLIG2","PDGFRA","PTPRC")
p1 <- plotEmbedding(ArchRProj = proj.filter, colorBy = "GeneIntegrationMatrix", name = sc_marker, embedding = "UMAP", imputeWeights = getImputeWeights(proj.filter))
有約束整合,,是在約束整合的基礎(chǔ)上進(jìn)行的,我們對整合有大致的了解,接下來進(jìn)行調(diào)整即可,。大概的思路可以這樣理解,,
例如在做完scATAC的降維聚類后,鑒定了marker基因,,也初步進(jìn)行了無約束整合,,確定了scATAC某些cluster或者全部的
cluster celltype,例如scATAC一共有C1-10 10個cluster,, 我們確定C1-3是A cell,,C4-5是B cell等等,那么就可以構(gòu)建
一個list,,讓scATAC的cluster與scRNA對應(yīng),,進(jìn)行整合。proj.sc <- proj.filter#(無約束整合前的數(shù)據(jù))
cM <- as.matrix(confusionMatrix(proj.sc$Clusters, proj.sc$predictedGroup)) preClust <- colnames(cM)[apply(cM, 1 , which.max)] cbind(preClust, rownames(cM)) #Assignments # preClust # [1,] "PN_CUX" "C17" # [2,] "PN_RO" "C18" # [3,] "IN_SST" "C20" # [4,] "PN_CUX" "C14" # [5,] "PN_RO" "C16" # [6,] "IN_VIP" "C21" # [7,] "MC" "C24" # [8,] "ODC" "C11" # [9,] "PN_RO" "C15" # [10,] "OPC" "C12" # [11,] "OPC" "C13" # [12,] "Astro" "C9" # [13,] "Astro" "C8" # [14,] "Astro" "C6" # [15,] "Astro" "C10" # [16,] "LAMA" "C22" # [17,] "ODC" "C19" # [18,] "LAMA" "C23" # [19,] "Astro" "C7" # [20,] "MC" "C1" # [21,] "PN_RO" "C4" # [22,] "MC" "C25" # [23,] "OPC" "C5" # [24,] "LAMA" "C2" # [25,] "MC" "C3"
#其實從無約束整合結(jié)果可以看出C6-13,,24具有很好得整合效果,,我們就將其作為約束整合得條件 groupList <- SimpleList( Astro = SimpleList( ATAC = proj.sc$cellNames[proj.sc$Clusters %in% c("C6", "C7","C8","C9","C10")], RNA = rownames([email protected][[email protected]$celltype %in% c("Astro"),]) ),
ODC = SimpleList( ATAC = proj.sc$cellNames[proj.sc$Clusters %in% c("C11")], RNA = rownames([email protected][[email protected]$celltype %in% c("ODC"),]) ), OPC = SimpleList( ATAC = proj.sc$cellNames[proj.sc$Clusters %in% c("C12", "C13")], RNA = rownames([email protected][[email protected]$celltype %in% c("OPC"),]) ), IN_PV_SST = SimpleList( ATAC = proj.sc$cellNames[proj.sc$Clusters %in% c("C20")], RNA = rownames([email protected][[email protected]$celltype %in% c("IN_PV","IN_SST"),]) ), IN_ID_VIP = SimpleList( ATAC = proj.sc$cellNames[proj.sc$Clusters %in% c("C21")], RNA = rownames([email protected][[email protected]$celltype %in% c("IN_ID","IN_VIP"),]) ), Non_anno = SimpleList( ATAC = proj.sc$cellNames[proj.sc$Clusters %in% c("C17","C18","C14","C16","C24","C15","C22","C19","C23","C1","C4","C25","C5","C2", "C3")], RNA = rownames([email protected][[email protected]$celltype %in% c("Astro","IN_PV","IN_SST","IN_ID","IN_VIP","ODC","OPC"),]) ) )
sc <- addGeneIntegrationMatrix( ArchRProj = proj.sc, useMatrix = "GeneScoreMatrix", matrixName = "GeneIntegrationMatrix", reducedDims = "Harmony", seRNA = scRNA, addToArrow = T, force= TRUE, groupList = groupList, groupRNA = "celltype", nameCell = "predictedCell_ArchR", nameGroup = "predictedGroup_ArchR", nameScore = "predictedScore_ArchR", transferParams = list(dims = 1:50) )
plotEmbedding( sc, colorBy = "cellColData", name = "predictedGroup_ArchR")
其實我們從結(jié)果可以看出,有約束整合的效果更好一點,,有約束和無約束的選擇按照自己實際的數(shù)據(jù)情況,。至此,我們scATAC分析的前期工作就終于完成了,,有了注釋好的對象,,我們接下來就進(jìn)行下游的一些分析。覺得分享有用的點個贊,、分享一下再走唄,!水平有限,有問題歡迎指出,,批評指正,!
|