感謝有邑同學的來稿,,有邑同學已獲贈免編程生信文章Genespring+Cytoscape還原操作詳解|TCGA,,GEO,,SEER數據庫挖掘系列的免費線下學習名額。 我是從cBioPortal數據庫下載的臨床數據,,下載使用臨床數據還是沒問題的,。RNAseq counts數據推薦用生信人TCGA小工具下載,這里放一個傳送門:https://www./article/95,。言歸正傳,,講cBioPortal。主頁面是這樣的: 選擇我們需要的腫瘤類型和數據集,,可以看到最新的TCGA數據庫里面有頭頸部腫瘤530例: 我們可以點擊download,,切換到下載頁面 點擊綠色框內的圖標進入概覽頁面 點擊Download,,就可以在瀏覽器里面下載數據了 下載好之后解壓就行了,里面有多個文件也有說明,。這里不再贅述了,。 #================臨床資料整理============ # 1.1 讀入數據 patientMatch <>'hnsc_tcga/hnsc_tcga/data_bcr_clinical_data_patient.txt', header = T, row.names = 1, comment.char = '#', sep = ' ', na.strings = '[Not Available]') > colnames(patientMatch) 一共有69列,,即69個變量,,527個樣本。這里我們需要的變量有: rownames(patientMatch) <->->$PATIENT_ID patientMatch <->->1] 其它的變量,,這里用了一個正則表達式: 'PRIMARY_SITE','LATERALITY' 'SEX' 'RACE' 'HISTORY_OTHER_MALIGNANCY' 'LYMPH_NODE_NECK_DISSECTION_INDICATOR' 'LYMPH_NODE_DISSECTION_METHOD' 'LYMPH_NODE_EXAMINED_COUNT','LYMPH_NODES_EXAMINED_HE_COUNT' 'LYMPH_NODES_EXAMINED_IHC_COUNT','PATH_MARGIN' 'VITAL_STATUS','DAYS_TO_LAST_FOLLOWUP' 'DAYS_TO_DEATH','TUMOR_STATUS' 'AJCC_TUMOR_PATHOLOGIC_PT' 'AJCC_NODES_PATHOLOGIC_PN','AJCC_METASTASIS_PATHOLOGIC_PM' 'AJCC_PATHOLOGIC_TUMOR_STAGE','EXTRACAPSULAR_SPREAD_PATHOLOGIC' 'GRADE','ANGIOLYMPHATIC_INVASION' 'PERINEURAL_INVASION','HPV_STATUS_P16' 'HPV_STATUS_ISH','TOBACCO_SMOKING_HISTORY_INDICATOR' 'ALCOHOL_HISTORY_DOCUMENTED' 'PHARMACEUTICAL_TX_ADJUVANT' 'TREATMENT_OUTCOME_FIRST_COURSE','NEW_TUMOR_EVENT_AFTER_INITIAL_TREATMENT' 'AGE','CLIN_M_STAGE' 'CLIN_N_STAGE','CLIN_T_STAGE' 'CLINICAL_STAGE' 'TISSUE_SOURCE_SITE','TUMOR_TISSUE_SITE' 'OS_STATUS','OS_MONTHS' 'DFS_STATUS' 原文本在這里,需要把換行符,、回車符去掉,,在兩個中間沒有字母或數字的雙引號”中間插入一個,。用的正則替換式如下圖: 完成之后變成這樣子: 'PRIMARY_SITE','LATERALITY','SEX','RACE','HISTORY_OTHER_MALIGNANCY','LYMPH_NODE_NECK_DISSECTION_INDICATOR','LYMPH_NODE_DISSECTION_METHOD','LYMPH_NODE_EXAMINED_COUNT','LYMPH_NODES_EXAMINED_HE_COUNT','LYMPH_NODES_EXAMINED_IHC_COUNT','PATH_MARGIN','VITAL_STATUS','DAYS_TO_LAST_FOLLOWUP','DAYS_TO_DEATH','TUMOR_STATUS','AJCC_TUMOR_PATHOLOGIC_PT','AJCC_NODES_PATHOLOGIC_PN','AJCC_METASTASIS_PATHOLOGIC_PM','AJCC_PATHOLOGIC_TUMOR_STAGE','EXTRACAPSULAR_SPREAD_PATHOLOGIC','GRADE','ANGIOLYMPHATIC_INVASION','PERINEURAL_INVASION','HPV_STATUS_P16','HPV_STATUS_ISH','TOBACCO_SMOKING_HISTORY_INDICATOR','ALCOHOL_HISTORY_DOCUMENTED','PHARMACEUTICAL_TX_ADJUVANT','TREATMENT_OUTCOME_FIRST_COURSE','NEW_TUMOR_EVENT_AFTER_INITIAL_TREATMENT' 'AGE','CLIN_M_STAGE','CLIN_N_STAGE','CLIN_T_STAGE','CLINICAL_STAGE','TISSUE_SOURCE_SITE','TUMOR_TISSUE_SITE','OS_STATUS','OS_MONTHS','DFS_STATUS' 這一步也可以在RStudio中完成,。我們選出這些列: # 1.2 選出所需要的變量 patientMatch <- patientmatch[,="" c(="">->'PRIMARY_SITE','LATERALITY','SEX','RACE','HISTORY_OTHER_MALIGNANCY', 'LYMPH_NODE_NECK_DISSECTION_INDICATOR','LYMPH_NODE_DISSECTION_METHOD', 'LYMPH_NODE_EXAMINED_COUNT','LYMPH_NODES_EXAMINED_HE_COUNT', 'LYMPH_NODES_EXAMINED_IHC_COUNT','PATH_MARGIN','VITAL_STATUS', 'DAYS_TO_LAST_FOLLOWUP','DAYS_TO_DEATH','TUMOR_STATUS','AJCC_TUMOR_PATHOLOGIC_PT', 'AJCC_NODES_PATHOLOGIC_PN','AJCC_METASTASIS_PATHOLOGIC_PM', 'AJCC_PATHOLOGIC_TUMOR_STAGE','EXTRACAPSULAR_SPREAD_PATHOLOGIC', 'GRADE','ANGIOLYMPHATIC_INVASION','PERINEURAL_INVASION','HPV_STATUS_P16', 'HPV_STATUS_ISH','TOBACCO_SMOKING_HISTORY_INDICATOR','ALCOHOL_HISTORY_DOCUMENTED', 'PHARMACEUTICAL_TX_ADJUVANT','TREATMENT_OUTCOME_FIRST_COURSE', 'NEW_TUMOR_EVENT_AFTER_INITIAL_TREATMENT', 'AGE','CLIN_M_STAGE','CLIN_N_STAGE','CLIN_T_STAGE','CLINICAL_STAGE', 'TISSUE_SOURCE_SITE','TUMOR_TISSUE_SITE','OS_STATUS','OS_MONTHS','DFS_STATUS')] > colnames(patientMatch)#選出的變量的變量名 # [1] 'PRIMARY_SITE' 'LATERALITY' # [3] 'SEX' 'RACE' # [5] 'HISTORY_OTHER_MALIGNANCY' 'LYMPH_NODE_NECK_DISSECTION_INDICATOR' # [7] 'LYMPH_NODE_DISSECTION_METHOD' 'LYMPH_NODE_EXAMINED_COUNT' # [9] 'LYMPH_NODES_EXAMINED_HE_COUNT' 'LYMPH_NODES_EXAMINED_IHC_COUNT' # [11] 'PATH_MARGIN' 'VITAL_STATUS' # [13] 'DAYS_TO_LAST_FOLLOWUP' 'DAYS_TO_DEATH' # [15] 'TUMOR_STATUS' 'AJCC_TUMOR_PATHOLOGIC_PT' # [17] 'AJCC_NODES_PATHOLOGIC_PN' 'AJCC_METASTASIS_PATHOLOGIC_PM' # [19] 'AJCC_PATHOLOGIC_TUMOR_STAGE' 'EXTRACAPSULAR_SPREAD_PATHOLOGIC' # [21] 'GRADE' 'ANGIOLYMPHATIC_INVASION' # [23] 'PERINEURAL_INVASION' 'HPV_STATUS_P16' # [25] 'HPV_STATUS_ISH' 'TOBACCO_SMOKING_HISTORY_INDICATOR' # [27] 'ALCOHOL_HISTORY_DOCUMENTED' 'PHARMACEUTICAL_TX_ADJUVANT' # [29] 'TREATMENT_OUTCOME_FIRST_COURSE' 'NEW_TUMOR_EVENT_AFTER_INITIAL_TREATMENT' # [31] 'AGE' 'CLIN_M_STAGE' # [33] 'CLIN_N_STAGE' 'CLIN_T_STAGE' # [35] 'CLINICAL_STAGE' 'TISSUE_SOURCE_SITE' # [37] 'TUMOR_TISSUE_SITE' 'OS_STATUS' # [39] 'OS_MONTHS' 'DFS_STATUS' > dim(patientMatch) #[1] 527 40 下面我們生成一個新的對象來儲存整理我們需要的變量和變量值: # 1.3 新對象baseLine儲存變量值baseLine <- patientmatch[,="">-> 后面HPV的感染有兩個檢測方法,,我們選其中任意一個陽性的患者判定其HPV感染狀態(tài)為陽性,通過創(chuàng)建一個函數來判定: # 1.5 判斷HPV感染 HPVlog <- function(x,="">->'Positive' %in% c(x, y)){hpv = 'Positive'} else if('Negative' %in% c(x, y)){hpv = 'Negative'} else {hpv = NA} return(hpv) } baseLine$HPV <- mapply(hpvlog,="">->$HPV_STATUS_ISH, patientMatch$HPV_STATUS_P16) 后面是結局事件和時間: # 1.6 生存時間和事件 導出整理出來的臨床資料: # 1.7 結果導出 建立表格并導出: # 1.8 建表 這是最終導出的表格: |
|
來自: 醫(yī)學院的石頭 > 《R》