monocle3與PAGA有點(diǎn)類似,,在UMAP圖上顯示軌跡圖,,沒有了樹狀的結(jié)構(gòu)。
原理,、圖的理解,,可以參考Reference中的鏈接
安裝 sudo apt install libudunits2-dev libgdal-dev
speedglm包違反CRAN規(guī)定被刪除了,,所以要從rstudio下載
using(pak) pkgs <- c('BiocGenerics' , 'DelayedArray' , 'DelayedMatrixStats' , 'limma' , 'lme4' , 'S4Vectors' , 'SingleCellExperiment' , 'SummarizedExperiment' , 'batchelor' , 'HDF5Array' , 'terra' , 'ggrastr' ,"hhoeflin/hdf5r" ,"mojaveazure/loomR@develop" , "url::https://packagemanager./cran/2023-03-31/src/contrib/speedglm_0.3-4.tar.gz" ) pak::pkg_install(pkgs)
0. 加載R包,、定義函數(shù) using(monocle3, tidyverse, magrittr, Matrix, Seurat, SeuratObject, sp) ps <- function (filename, plot = FALSE , w = 9 , h = 6 ) { if (is.object(plot)) { print(plot) } plot <- recordPlot() pdf(file = paste0(filename,".pdf" ), onefile = T , width = w, height = h) replayPlot(plot) dev.off() } n_jobs <- 8 # 使用的線程數(shù)
1.創(chuàng)建monocle對象 cell_type:已經(jīng)注釋好了細(xì)胞類型 sobj_embed:UMAP降維信息,是數(shù)據(jù)框,,行名是細(xì)胞,,有兩列分別對應(yīng)兩個維度 expression_matrix
, a numeric matrix of expression values, where rows are genes, and columns are cellscell_metadata
, a data frame, where rows are cells, and columns are cell attributes (such as cell type, culture condition, day captured, etc.)gene_metadata
, an data frame, where rows are features (e.g. genes), and columns are gene attributes, such as biotype, gc content, etc. one of the columns of the gene_metadata
should be named "gene_short_name", which represents the gene symbol or simple name (generally used for plotting) for each gene.sobj <- readRDS("final.rds" ) gene_metadata <- sobj@[email protected] gene_metadata$gene_short_name <- rownames(gene_metadata) cds <- new_cell_data_set(as(sobj@assays$RNA@counts,"sparseMatrix" ), cell_metadata = [email protected] , gene_metadata = gene_metadata ) hvg <- seob@[email protected] sobj_embed <- Seurat::Embeddings(seob, reduction = "umap" ) rm(sobj, gene_metadata)
2.pre-process 標(biāo)準(zhǔn)化、預(yù)處理,、取批次
## Normalize and pre-process the data cds %<>% preprocess_cds(num_dim = 30 ,method="PCA" ,norm_method="log" ,use_genes=hvg)## Remove batch effects with cell alignment cds %<>% align_cds(alignment_group = "orig.ident" ,preprocess_method="PCA" )
3.Reduce dimensions and Cluster cells 降維,、聚類、分群,、分partition
這里使用UMAP作為降維算法,,再使用軌跡分區(qū)算法,把所有細(xì)胞分為兩個partitio,,不同分區(qū)的細(xì)胞會進(jìn)行單獨(dú)的軌跡分析,。
plot_pc_variance_explained(cds) ps("PCA.pdf" , w = 9 , h = 6 )## Reduce the dimensions using UMAP cds %<>% reduce_dimension(umap.fast_sgd = TRUE , cores = n_jobs, reduction_method = "UMAP" , preprocess_method = "PCA" )## 從seurat導(dǎo)入整合過的umap坐標(biāo) cds_embed <- cds@int_colData$reducedDims$UMAP cds@int_colData$reducedDims$UMAP <- sobj_embed[rownames(cds_embed),] plot_cells(cds, reduction_method = "UMAP" , color_cells_by = "cell_type" ,group_label_size=6 ) ps("UMAP_CellType.pdf" )## Cluster the cells cds %<>% rcluster_cells(reduction_method = "UMAP" , cluster_method = "leiden" ,random_seed=1314 ) plot_cells(cds, color_cells_by = "partition" ,group_label_size=6 ) ps("UMAP_partition.pdf" )
4.Order cells in pseudotime along a trajectory 手動選擇root需要根據(jù)自己的生物學(xué)背景知識
## Learn a graph cds %<>% learn_graph() plot_cells(cds,trajectory_graph_segment_size = 1.5 , group_label_size = 6 ,group_cells_by="cluster" , color_cells_by = "cell_type" , label_groups_by_cluster = FALSE , label_leaves = FALSE , label_branch_points = FALSE ) ps("UMAP_graph.pdf" ) cds <- order_cells(cds) plot_cells(cds, group_label_size = 6 , color_cells_by = "pseudotime" , label_cell_groups = FALSE , label_leaves = FALSE , label_branch_points = FALSE , graph_label_size = 1.5 ) ps("Trajectory_Pseudotime.pdf" )
The black lines show the structure of the graph. 手動選擇root The black lines show the structure of the graph. Note that the graph is not fully connected: cells in different partitions are in distinct components of the graph. The circles with numbers in them denote special points within the graph. Each leaf, denoted by light gray circles, corresponds to a different outcome (i.e. cell fate) of the trajectory. Black circles indicate branch nodes, in which cells can travel to one of several outcomes. 5.Perform differential expression analysis 這是空間差異分析常用的方法,在空間轉(zhuǎn)錄組中也有,。
morans_I
column, which ranges from -1 to +1. A value of 0 indicates no effect, while +1 indicates perfect positive autocorrelation and suggests that nearby cells have very similar values of a gene's expression. Significant values much less than zero are generally rare.
dea_res <- graph_test(cds, neighbor_graph = "principal_graph" , cores = n_jobs) %>% dplyr::filter(q_value < 0.05 ) fwrite(dea_res, "Trajectory_genes.csv" )
6.Finding genes that change as a function of pseudotime 尋找擬時(shí)軌跡差異基因,,使用graph_test分析最重要的結(jié)果是莫蘭指數(shù)(morans_I),其值在-1至1之間,,0代表此基因沒有,。空間共表達(dá)效應(yīng),,1代表此基因在空間距離相近的細(xì)胞中表達(dá)值高度相似,。根據(jù)莫蘭指數(shù)挑選前10個基因用于可視化,。
degs <- dea_res$gene_short_name top_genes <- dea_res %>% dplyr::top_n(n = 10 , morans_I) %>% dplyr::pull(gene_short_name) %>% as.character() plot_genes_in_pseudotime(cds[top_genes, ], color_cells_by = "cell_type" , min_expr = 0.5 , ncol = 2 ) ps("Top_Genes_Jitterplot.pdf" , w = 9 , h = 6 ) p <- plot_cells(cds, genes = top_genes, show_trajectory_graph = FALSE , label_cell_groups = FALSE , label_leaves = FALSE ) p$facet$params$ncol <- 5 ps("Top_Genes_Featureplot.pdf" , plot = p)
挑選top10畫圖展示,基因表達(dá)隨時(shí)間變化趨勢圖 7.Finding modules of co-regulated genes 尋找共表達(dá)基因模塊,根據(jù)上邊的差異分析結(jié)果,,按照UMAP和Louvain 聚類,,將這些基因分在不同的模塊中,有些模塊在某些細(xì)胞中特異高表達(dá),。
Once we have a set of genes that vary in some interesting way across the clusters, Monocle provides a means of grouping them into modules. We can call find_gene_modules(), which essentially runs UMAP on the genes (as opposed to the cells) and then groups them into modules using Louvain community analysis
gene_module_df <- find_gene_modules(cds[degs, ], resolution = 1e-2 , cores = n_jobs) fwrite(gene_module_df, "Genes_Module.csv" ) cell_group_df <- tibble::tibble( cell = row.names(colData(cds)), cell_group = colData(cds)$cell_type ) agg_mat <- aggregate_gene_expression(cds, gene_module_df, cell_group_df) row.names(agg_mat) <- stringr::str_c("Module " , row.names(agg_mat)) pheatmap::pheatmap(agg_mat, cluster_rows = TRUE , cluster_cols = TRUE , scale = "column" , clustering_method = "ward.D2" , fontsize = 6 ) ps("heatmap.pdf" , w = 5 , h = 16 ) plot_cells(cds, genes = gene_module_df %>% dplyr::filter(module %in % c(200 , 169 , 138 , 22 ,17 ,91 )), group_cells_by = "partition" , color_cells_by = "partition" , show_trajectory_graph = FALSE ) ps("module.pdf" ) saveRDS(cds,file="cds.rds" )
Reference https://zhuanlan.zhihu.com/p/451727080 https://cole-trapnell-lab./monocle3/docs https://www.jianshu.com/p/c402b6588e17