20200315

GO

对各个基因组之间分类好的基因进行GO功能富集分析

在A基因组中AS完全保守的基因对

awk '$3==1{print $0}' A2_At_conserveRate.txt|wc -l

总共249对

在FDR校正后，没有显著性富集的代谢通路，于是选择不对p.value进行校正

提取对应的GO编号进行GO富集分析

awk '$3==1{print $0}' A2_At_conserveRate.txt |cut -f1 |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_At.annot >GO/A2_vs_At_completeConserve.GO

在D基因组中AS完全保守的基因对

awk '$3==1{print $0}' D5_Dt_conserveRate.txt|wc -l

总共315对

在两个基因组中都完全保守的基因对

awk '$3==1{print $0}' D5_Dt_conserveRate.txt|cut -f1|xargs -I {} grep {} ../../../all_homole_ES_IR.txt|cut -f5|xargs -I {} grep {} A2_At_conserveRate.txt|awk '$3==1{print $0}'|wc -l

总共13对

部分保守的基因

多倍化之后基因获得新的IR事件

A2_vs _At

awk '$3==2&&$1~/^e/{print $1}' A2_At_conserveRate.txt|wc -l

提取对应的GO编号

awk '$3==2&&$1~/^e/{print $1}' ../A2_At_conserveRate.txt|xargs -I {} grep {} ../../../../all_homole_ES_IR.txt|cut -f5|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_At.annot
 >A2_vs_AtIncreaseAs.GO

D5_vs_Dt

awk '$3==2&&$1~/^Go/{print $1}' D5_Dt_conserveRate.txt|wc -l

提取对应的GO编号

awk '$3==2&&$1~/^Go/{print $1}' ../D5_Dt_conserveRate.txt|xargs -I {} grep {} ../../../../all_homole_ES_IR.txt|cut -f1|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_Dt.annot >D5_vs_DtIncreaseAs.GO

多倍化之后基因丧失部分IR事件

A2_vs_At

awk '$3==2&&$1~/^Gh/{print $1}' A2_At_conserveRate.txt|wc -l

提取对应的GO编号

awk '$3==2&&$1~/^Gh/{print $1}'  ../A2_At_conserveRate.txt|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_At.annot >A2_vs_AtDecreaseAs.GO

D5_vs_Dt

awk '$3==2&&$1~/^Gh/{print $1}' D5_Dt_conserveRate.txt|wc -l

提取对应的GO编号

awk '$3==2&&$1~/^Gh/{print $1}' ../D5_Dt_conserveRate.txt|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_Dt.annot >D5_vs_DtDecreaseAs.GO

在A基因组中部分保守的基因

awk '$3==3{print $0}' A2_At_conserveRate.txt|wc -l

awk '$3==3{print $1}' ../A2_At_conserveRate.txt|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_At.annot >A2_vs_Atintersect.GO

在D基因组中部分保守的基因

awk '$3==3{print $0}' D5_Dt_conserveRate.txt|wc -l

awk '$3==3{print $1}' ../D5_Dt_conserveRate.txt|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_Dt.annot >D5_vs_Dtintersect.GO

在A、D基因组都不保守的AS

首先在At与A2的比较中不保守，同时对应的亚同源基因在D5与Dt的比较中同样不保守

awk '$3==4{print $1}' At_Dt_conserveRate.txt|cut -f1|xargs -I {} grep {} ../../../all_homole_ES_IR.txt|cut -f1|xargs -I {} grep {} D5_Dt_conserveRate.txt|awk '$3==4{print $0}'
## 提取GO编号
awk '$3==4{print $1}' At_Dt_conserveRate.txt|cut -f1|xargs -I {} grep {} ../../../all_homole_ES_IR.txt|cut -f1|xargs -I {} grep {} D5_Dt_conserveRate.txt|awk '$3==4{print $1}'|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3_Dt.annot >GO/At_DtallNoconserve.GO

提取对应的GO结果进行画图

根据p-value筛选GO item,由于我在做GO富集的时候，没有进行FDR的校正；所以最后两列的信息是一样的

##文件内容
GO_acc    term_type    Term    pvalue    FDR
GO:0043687    P    post-translational protein modification    0.0037    0.0037
GO:0006796    P    phosphate metabolic process    0.0044    0.0044
awk -F "\t" 'NF>=2&&$8<=0.01{print $1,$2,$3,$8,$9}' OFS="\t" A2_vs_At_completeConserve.GO

在At与Dt中对应的保守基因
在At与Dt中分别对应的IR增加
在At与Dt中分别对应的IR减少

Previous完全保守的基因对 Next20200214

Last updated 5 years ago

Was this helpful?