簡單地說,,比較兩組或多組人群隨著時間的延續(xù),存活個體的比例變化趨勢,?;钪膫€體越少的組危險性越大,對應(yīng)的基因?qū)膊∮绊懺酱?,對?yīng)的藥物治療效果越差,。 生存分析適合于處理時間-事件數(shù)據(jù),如下 生存時間數(shù)據(jù)有兩種類型:
生存概率 (survival probability)指某段時間開始時存活的個體至該時間結(jié)束時仍然存活的可能性大小,。
生存率 (Survival rate),,用 生存分析一個常用的方法是壽命表法。 壽命表是描述一段時間內(nèi)生存狀況,、終點事件和生存概率的表格,,需計算累積生存概率即每一步生存概率的乘積 (也可能是原始生存概率),可完成對病例隨訪資料在任意指定時點的生存狀況評價,。 R做生存分析R中做生存分析需要用到包 讀入數(shù)據(jù) library(survival)BRCA <- read.table('brca.tsv',="" sep='\t' ,="" header=""> ID SampleType PAM50Call_RNAseq Days.survival pathologic_stage1 TCGA-E9-A2JT-01 Tumor_type LumA 288 stage iia2 TCGA-BH-A0W4-01 Tumor_type LumA 759 stage iia3 TCGA-BH-A0B5-01 Tumor_type LumA 2136 stage iiia4 TCGA-AC-A3TM-01 Tumor_type Unknown 762 stage iiia5 TCGA-E9-A5FL-01 Tumor_type Unknown 24 stage iib6 TCGA-AC-A3TN-01 Tumor_type Unknown 456 stage iib vital_status1 02 03 04 05 06 0 簡單地看下每一列都有什么內(nèi)容,,方便對數(shù)據(jù)整體有個了解,比如有無特殊值,。 summary(BRCA) ID SampleType PAM50Call_RNAseq Days.survival TCGA-3C-AAAU-01: 1 Tumor_type:1090 Basal :138 Min. : 0.0 TCGA-3C-AALI-01: 1 Her2 : 65 1st Qu.: 450.2 TCGA-3C-AALJ-01: 1 LumA :415 Median : 848.0 TCGA-3C-AALK-01: 1 LumB :194 Mean :1247.0 TCGA-4H-AAAK-01: 1 Normal : 24 3rd Qu.:1682.8 TCGA-5L-AAT0-01: 1 Unknown:254 Max. :8605.0 (Other) :1084 pathologic_stage vital_status stage iia :359 Min. :0.0000 stage iib :259 1st Qu.:0.0000 stage iiia:156 Median :0.0000 stage i : 90 Mean :0.1394 stage ia : 85 3rd Qu.:0.0000 stage iiic: 67 Max. :1.0000 (Other) : 74 計算壽命表 # Days.survival:跟蹤到的存活時間# vital_status: 跟蹤到的存活狀態(tài)# ~1表示不進(jìn)行分組fit <- survfit(surv(days.survival,="" vital_status)~1,="" data="BRCA)#" 獲得的survial列就是生存率="" summary(fit)call:="" survfit(formula="Surv(Days.survival," vital_status)="" ~="" 1,="" data="BRCA)" time="" n.risk="" n.event="" survival="" std.err="" lower="" 95%="" ci="" upper="" 95%="" ci="" 116="" ="" 1021="" ="" ="" ="" 1="" ="" 0.999="" 0.000979="" ="" ="" ="" 0.997="" ="" ="" ="" 1.000="" 158="" ="" 1017="" ="" ="" ="" 1="" ="" 0.998="" 0.001386="" ="" ="" ="" 0.995="" ="" ="" ="" 1.000="" 160="" ="" 1016="" ="" ="" ="" 1="" ="" 0.997="" 0.001697="" ="" ="" ="" 0.994="" ="" ="" ="" 1.000="" 172="" ="" 1010="" ="" ="" ="" 1="" ="" 0.996="" 0.001962="" ="" ="" ="" 0.992="" ="" ="" ="" 1.000="" 174="" ="" 1008="" ="" ="" ="" 1="" ="" 0.995="" 0.002195="" ="" ="" ="" 0.991="" ="" ="" ="" 0.999="" 197="" ="" 1003="" ="" ="" ="" 1="" ="" 0.994="" 0.002406="" ="" ="" ="" 0.989="" ="" ="" ="" 0.999="" 224="" ="" 993="" ="" ="" ="" 1="" ="" 0.993="" 0.002604="" ="" ="" ="" 0.988="" ="" ="" ="" 0.998="" 227="" ="" 990="" ="" ="" ="" 1="" ="" 0.992="" 0.002788="" ="" ="" ="" 0.987="" ="" ="" ="" 0.998="" 239="" ="" 987="" ="" ="" ="" 1="" ="" 0.991="" 0.002961="" ="" ="" ="" 0.985="" ="" ="" ="" 0.997="" 255="" ="" 981="" ="" ="" ="" 1="" ="" 0.990="" 0.003125="" ="" ="" ="" 0.984="" ="" ="" ="" 0.996="" 266="" ="" 978="" ="" ="" ="" 1="" ="" 0.989="" 0.003282="" ="" ="" ="" 0.983="" ="" ="" ="" 0.996="" 295="" ="" 965="" ="" ="" ="" 1="" ="" 0.988="" 0.003435="" ="" ="" ="" 0.981="" ="" ="" ="" 0.995="" 302="" ="" 962="" ="" ="" ="" 1="" ="" 0.987="" 0.003581="" ="" ="" ="" 0.980="" ="" ="" ="" 0.994="" 304="" ="" 958="" ="" ="" ="" 1="" ="" 0.986="" 0.003723="" ="" ="" ="" 0.979="" ="" ="" ="" 0.993="" 320="" ="" 948="" ="" ="" ="" 1="" ="" 0.985="" 0.003862="" ="" ="" ="" 0.977="" ="" ="" ="" 0.993="" 322="" ="" 946="" ="" ="" ="" 1="" ="" 0.984="" 0.003995="" ="" ="" ="" 0.976="" ="" ="" =""> 繪制生存曲線,,橫軸表示生存時間,縱軸表示生存概率,,為一條梯形下降的曲線,。下降幅度越大,表示生存率越低或生存時間越短,。 library(survminer)# conf.int:是否顯示置信區(qū)間# risk.table: 對應(yīng)時間存活個體總結(jié)表格ggsurvplot(fit, conf.int=T,risk.table=T)
# 這三步不是必須的,,只是為了方便,選擇其中的4個確定了的分組進(jìn)行分析# 同時為了簡化圖例,,給列重命名一下,,使得列名不那么長BRCA_PAM50 <- brca[grepl('basal|her2|luma|lumb',brca$pam50call_rnaseq),]brca_pam50=""><- droplevels(brca_pam50)colnames(brca_pam50)[colnames(brca_pam50)="='PAM50Call_RNAseq']"><- 'pam50'#="" 按pam50分組fit=""><- survfit(surv(days.survival,="" vital_status)~pam50,="" data="BRCA_PAM50)#" 繪制曲線ggsurvplot(fit,="" conf.int="F,risk.table=T," risk.table.col='strata' ,="" pval=""> 參考資料
|
|