20200102GO富集分析

对分好类的基因对进行GO功能富集分析

第一类conserve pattern

1.在各个基因组都存在相同的IR

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=863404143

awk -F "_" '{print $1"_"$2}' ../converse/1_1 |sort|uniq|xargs  -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold |awk '{print $1"\n"$3}' |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >1_1.go

2.在各个基因组中都不存在IR的基因对

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=913752380

awk '{print $1"\n"$3}' ../converse/1_2|sort|uniq|xargs  -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot  >1_2.go

3.在A基因组中存在保守的IR

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=596719406

awk -F "_" '{print $1}' ../converse/1_3 |sort|uniq|xargs  -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold |awk '{print $1"\n"$3}' |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot  >1_3.go

4.在D基因组中存在保守的IR

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=642301472

awk -F "_" '{print $1"_"$2}' ../converse/1_4 |sort|uniq|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >1_4.go

第二类IR在多倍化中发生丢失

1.只在Dt中发生IR丢失

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=819544063

awk -F "_" '{print $1"_"$2}' ../converse/2_1 | sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot > 2_1.go

2.只在At中发生IR丢失

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=472631356

awk -F "_" '{print $1"_"$2}' ../converse/2_2 | sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot > 2_2.go

3.在两个基因组中都发生了丢失

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=415512684

## 将第四类的情况也统计到这里
 awk -F "_" '{print $1}' ../converse/2_3|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >2_3.go

## 只在A2中发生IR的事件
cut -f5 ../converseBed/A2.bed | awk -F "_" '{print $1}'|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >>2_3.go
## 只在D5中发生的IR事件
cut -f5 ../converseBed/D5.bed |awk -F "_" '{print $1}'|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >>2_3.go

sort 2_3.go |uniq >1 
mv 1 2_3.go

第三类IR在多倍化中获得新的IR事件

1.只在At中发生IR获得

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=893002877

cut -f5 ../converseBed/At.bed |awk -F "_" '{print $1"_"$2}' |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >3_1.go

2.只在Dt中发生IR获得

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=566100874

cut -f5 ../converseBed/Dt.bed |awk -F "_" '{print $1"_"$2}' |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}' | xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >3_3.go

3.在两个基因组中都发生了获得

http://bioinfo.cau.edu.cn/agriGO/analysis_precheck.php?session=205614710

这里包括第四类中的前两类

cat ../converse/3_2  ../converse/4_1  ../converse/4_2 |awk -F "_" '{print $1"_"$2}'|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold | awk '{print $1"\n"$3}'|sort |uniq |xargs -I {} grep {} ~/genome_data/Ghirsutum_genome_HAU_v1.1/Gh_Noscagenes_GO_V3.annot >3_2.go

查看不同类别之间共有的GO号

先对文件进行筛选

##根据FDR<=0.05 筛选GO结果
awk -F "\t" '$8<=0.01&&$9<=0.01{print $1,$2,$3,$8,$9}' OFS="\t"

partition splicing pattern

二倍体直系同源基因都存在多种剪切模式,在多倍化之后,分别只继承在单个亚基因组中,说明可变剪切在亚基因组上发生了partition

## 提取对应的基因
grep GO:0010027 raw/2_1.txt |awk -F "\t" '{print $10}'|sed -e 's/\/\//\n/g' -e 's/ //g'

2020-01-07

讨论多倍化过程中AS的变化

两个亚基因组出现同样AS

同时存在AS

  • 1_1

  • 3_2

  • 4_1

  • 4_2

awk -F "_" '{print $1"_"$2}' ../converse/1_1 |sort|uniq|xargs  -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/1_1.txt

cat ../converse/3_2  ../converse/4_1  ../converse/4_2 |awk -F "_" '{print $1"_"$2}'|xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold |sort|uniq >geneid/3_2.txt 

 awk -F "_" '{print $1"_"$2}' ../converse/4_1 |sort|uniq|xargs  -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/4_1.txt

 awk -F "_" '{print $1"_"$2}' ../converse/4_2 |sort|uniq|xargs  -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold >geneid/4_2.txt

同时不存在AS

四个基因组都不存在AS的基因对先不分析

  • 2_3

  • 4_3

  • 4_4

 awk -F "_" '{print $1}' ../converse/2_3|sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/2_3.txt

 awk -F "_" '{print $1}' ../converse/4_3 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/4_3.txt

  awk -F "_" '{print $1}' ../converse/4_4 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/4_4.txt

两个亚基因组有不同的AS

只在At中存在

  • 1_3

  • 2_1

  • 3_1

  • 5_2

  awk -F "_" '{print $1}' ../converse/1_3 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/1_3.txt

  awk -F "_" '{print $1"_"$2}' ../converse/2_1 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/2_1.txt

   awk -F "_" '{print $1"_"$2}'  ../converse/3_1 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/3_1.txt

     awk -F "_" '{print $1"_"$2}'  ../converse/5_2 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/5_2.txt

只在Dt中存在

  • 1_4

  • 2_2

  • 3_3

  • 5_1

awk -F "_" '{print $1"_"$2}' ../converse/1_4 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/1_4.txt

awk -F "_" '{print $1"_"$2}' ../converse/2_2 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/2_2.txt

awk -F "_" '{print $1"_"$2}' ../converse/3_3 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/3_3.txt

awk -F "_" '{print $1"_"$2}' ../converse/5_1 |sort | uniq | xargs -I {} grep {} ../../../../GhDt_Gr_GhAt_Ga_end_noScaffold  >geneid/5_1.txt

分别进行GO富集分析

在At中有的可变剪切,而Dt中没有可变剪切;

Last updated