对分好类的基因对进行GO功能富集分析
第一类conserve pattern
1.在各个基因组都存在相同的IR
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=863404143
awk -F "_" '{print $1"_"$2}' ../converse/1_1 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold |awk '{print $1"\n"$3}' |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >1_1.go
2.在各个基因组中都不存在IR的基因对
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=913752380
awk '{print $1"\n"$3}' ../converse/1_2|sort|uniq|xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >1_2.go
3.在A基因组中存在保守的IR
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=596719406
awk -F "_" '{print $1}' ../converse/1_3 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold |awk '{print $1"\n"$3}' |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >1_3.go
4.在D基因组中存在保守的IR
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=642301472
awk -F "_" '{print $1"_"$2}' ../converse/1_4 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >1_4.go
第二类IR在多倍化中发生丢失
1.只在Dt中发生IR丢失
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=819544063
awk -F "_" '{print $1"_"$2}' ../converse/2_1 | sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot > 2_1.go
2.只在At中发生IR丢失
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=472631356
awk -F "_" '{print $1"_"$2}' ../converse/2_2 | sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot > 2_2.go
3.在两个基因组中都发生了丢失
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=415512684
## 将第四类的情况也统计到这里
awk -F "_" '{print $1}' ../converse/2_3|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >2_3.go
## 只在A2中发生IR的事件
cut -f5 ../converseBed/A2.bed | awk -F "_" '{print $1}'|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >>2_3.go
## 只在D5中发生的IR事件
cut -f5 ../converseBed/D5.bed |awk -F "_" '{print $1}'|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >>2_3.go
sort 2_3.go |uniq >1
mv 1 2_3.go
第三类IR在多倍化中获得新的IR事件
1.只在At中发生IR获得
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=893002877
cut -f5 ../converseBed/At.bed |awk -F "_" '{print $1"_"$2}' |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >3_1.go
2.只在Dt中发生IR获得
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=566100874
cut -f5 ../converseBed/Dt.bed |awk -F "_" '{print $1"_"$2}' |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >3_3.go
3.在两个基因组中都发生了获得
http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=205614710
这里包括第四类中的前两类
cat ../converse/3_2 ../converse/4_1 ../converse/4_2 |awk -F "_" '{print $1"_"$2}'|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}'|sort |uniq |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >3_2.go
查看不同类别之间共有的GO号
先对文件进行筛选
##根据FDR<=0.05 筛选GO结果
awk -F "\t" '$8<=0.01&&$9<=0.01{print $1,$2,$3,$8,$9}' OFS="\t"
partition splicing pattern
二倍体直系同源基因都存在多种剪切模式,在多倍化之后,分别只继承在单个亚基因组中,说明可变剪切在亚基因组上发生了partition
## 提取对应的基因
grep GO:0010027 raw/2_1.txt |awk -F "\t" '{print $10}'|sed -e 's/\/\//\n/g' -e 's/ //g'
2020-01-07
讨论多倍化过程中AS的变化
两个亚基因组出现同样AS
同时存在AS
awk -F "_" '{print $1"_"$2}' ../converse/1_1 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/1_1.txt
cat ../converse/3_2 ../converse/4_1 ../converse/4_2 |awk -F "_" '{print $1"_"$2}'|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold |sort|uniq >geneid/3_2.txt
awk -F "_" '{print $1"_"$2}' ../converse/4_1 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/4_1.txt
awk -F "_" '{print $1"_"$2}' ../converse/4_2 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/4_2.txt
同时不存在AS
四个基因组都不存在AS的基因对先不分析
awk -F "_" '{print $1}' ../converse/2_3|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/2_3.txt
awk -F "_" '{print $1}' ../converse/4_3 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/4_3.txt
awk -F "_" '{print $1}' ../converse/4_4 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/4_4.txt
两个亚基因组有不同的AS
只在At中存在
awk -F "_" '{print $1}' ../converse/1_3 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/1_3.txt
awk -F "_" '{print $1"_"$2}' ../converse/2_1 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/2_1.txt
awk -F "_" '{print $1"_"$2}' ../converse/3_1 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/3_1.txt
awk -F "_" '{print $1"_"$2}' ../converse/5_2 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/5_2.txt
只在Dt中存在
awk -F "_" '{print $1"_"$2}' ../converse/1_4 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/1_4.txt
awk -F "_" '{print $1"_"$2}' ../converse/2_2 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/2_2.txt
awk -F "_" '{print $1"_"$2}' ../converse/3_3 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/3_3.txt
awk -F "_" '{print $1"_"$2}' ../converse/5_1 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/5_1.txt
分别进行GO富集分析
在At中有的可变剪切,而Dt中没有可变剪切;