久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

分享

基因組重測序的unmapped reads assembly探究 【直播】我的基因組86

 健明 2021-07-14

在前面的直播基因組系列,,我們講解過那些比對不少我們?nèi)祟惖膮⒖蓟蚪M序列的數(shù)據(jù),,其實(shí)可以細(xì)致的進(jìn)行探究。

直播】我的基因組(十五):提取未比對的測序數(shù)據(jù)

這里主要參考這篇文章的圖4:http://www./ng/journal/v42/n11/figtab/ng.691F4.html

組裝的contig注釋到物種

這是2010年發(fā)表于nature genetics雜志的Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing 雖然文章選擇的是SOAPdenovo,ABySS,Velvet這3款軟件來進(jìn)行組裝,,但畢竟是2010年的文章了,現(xiàn)在其實(shí)有更好的選擇,,比如Minia

選擇Minia工具來組裝

Minia軟件也是基于de Bruijn圖原理的短序列組裝工具,,優(yōu)于以前的ABySS和SOAPdenovo,關(guān)鍵是速度非???,十幾分鐘就OK了,不消耗計(jì)算機(jī)資源,,所以這里就選擇它啦,。

下載安裝Minia

安裝官網(wǎng)的指導(dǎo)說明書下載二進(jìn)制版本即可,代碼如下:

  1. ## Download and install Minia

  2. # http://minia./

  3. cd ~/biosoft

  4. mkdir Minia &&  cd Minia

  5. wget https://github.com/GATB/minia/releases/download/v2.0.7/minia-v2.0.7-bin-Linux.tar.gz

  6. tar -zxvf minia-v2.0.7-bin-Linux.tar.gz

  7. ~/biosoft/Minia/minia-v2.0.7-bin-Linux/bin/minia --help

  8. ## eg: ./minia -in reads.fa -kmer-size 31 -abundance-min 3 -out output_prefix

軟件使用方法也非常簡單,,就一行命令,,其中最佳 -kmer-size需要用KmerGenie來確定。

使用

step1:提取比對失敗的reads

  1. samtools view -f4 jmzeng_recal.bam |perl -alne '{print "\@$F[0]\n$F[9]\n+\n$F[10]" }' >unmapped.fq

  2. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-lite.pl -verbose -fastq unmapped.fq -graph_data unmapped.gd -out_good null -out_bad null

  3. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-graphs.pl -i unmapped.gd -png_all -o unmapped

  4. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-graphs.pl -i unmapped.gd -html_all -o unmapped

  5. cd ~/data/project/myGenome/gatk/jmzeng/unmapped

共31481084/4=7870271,,僅僅是7.8M的reads

Input Information

Input file(s):unmapped.fq
Input format(s):FASTQ
# Sequences:7,870,271
Total bases:1,180,540,650

step2: 用KmerGenie確定kmer值

KmerGenie estimates the best k-mer length for genome de novo assembly.

KmerGenie predictions can be applied to single-k genome assemblers (e.g. Velvet, SOAPdenovo 2, ABySS, Minia).

  1. ## http://kmergenie.bx./

  2. cd ~/biosoft

  3. mkdir KmerGenie &&  cd KmerGenie

  4. wget http://kmergenie.bx./kmergenie-1.7044.tar.gz

  5. tar zxvf kmergenie-1.7044.tar.gz

  6. cd kmergenie-1.7044

  7. make

  8. python setup.py install --user

  9. ~/.local/bin/kmergenie --help

  10. cd ~/data/project/myGenome/gatk/jmzeng/unmapped

  11. ~/.local/bin/kmergenie unmapped.fq

step3: 運(yùn)行Minia

  1. cd ~/data/project/myGenome/gatk/jmzeng/unmapped

  2. ~/biosoft/Minia/minia-v2.0.7-bin-Linux/bin/minia  -in unmapped.fq -kmer-size 31 -abundance-min 3 -out output_prefix

7.8M的reads組裝之后有272007條contigs

組裝之后:

Prinseq v0.20.4 was used to calculate assembly statistics, including N50 contig size, GC content

  1. cd ~/data/project/myGenome/gatk/jmzeng/unmapped

  2. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-lite.pl -verbose -fasta output_prefix.contigs.fa  -graph_data contigs.gd -out_good null -out_bad null

  3. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-graphs.pl -i contigs.gd -png_all -o contigs

  4. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-graphs.pl -i contigs.gd -html_all -o contigs

  5. perl ~/biosoft/PRINSEQ/prinseq-lite-0.20.4/prinseq-lite.pl -verbose -fasta output_prefix.contigs.fa  -stats_assembly

就是給出一些指標(biāo),,如下;

  1. stats_assembly    N50 176

  2. stats_assembly    N75 113

  3. stats_assembly    N90 78

  4. stats_assembly    N95 70

Input Information

Input file(s):output_prefix.contigs.fa
Input format(s):FASTA
# Sequences:272,007
Total bases:44,868,011

Length Distribution

Mean sequence length:164.95 ± 204.44 bp
Minimum length:63 bp
Maximum length:10,187 bp
Length range:10,125 bp
Mode length:150 bp with 16,461 sequences

然后用RNA-SEQ數(shù)據(jù)來比對驗(yàn)證,! 以后再講

把組裝好的contigs拿去NCBI做blast看看物種分布,Distribution of top nucleotide BLAST hits by species from the NCBI nr database for 1000 random contigs in the assembly,!其實(shí)上面的prinseq軟件也簡單的給出了一個(gè)污染物種分布情況表,但是這個(gè)原理不一樣,。以后再講

    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多