对同源基因的剪切事件进行分类.md

将各个剪切事件在所有棉种中进行汇总

内含子保留事件

awk '$3~/IntronR/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >intronR_count.txt 
awk '$3~/IntronR/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>intronR_count.txt 
awk '$3~/IntronR/{print $2}' ../A2/end_third  |sort |uniq -c |awk '{print $2"\t"$1}' >>intronR_count.txt

外显子跳跃

awk '$3~/ExonS/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>exonSkip_count.txt 
awk '$3~/ExonS/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>exonSkip_count.txt 
awk '$3~/ExonS/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>exonSkip_count.txt

AltA可变的5'

awk '$3~/AltA/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltA_count.txt 
awk '$3~/AltA/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltA_count.txt 
awk '$3~/AltA/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}'  >>AltA_count.txt

AltD可变的3’

awk '$3~/AltD/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltD_count.txt
awk '$3~/AltD/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltD_count.txt
awk '$3~/AltD/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltD_count.txt

根据被剪切下来的片段长度来找保守的剪切事件

先统计同源基因发生IR事件时的长度与ExonS的长度

得到结果如下:

## 每个基因的IR长度与ExonS长度
Ghir_D01G000050    84,101,352,1233,1507,1622|0    Gorai.002G000700.v2.1    87,100,101,316,1510|0    Ghir_A01G000050    0|0    evm.TU.Ga01G0005    0|0
Ghir_D01G000080    0|0    Gorai.002G001000.v2.1    0|0    Ghir_A01G000060    0|0    evm.TU.Ga01G0007    0|0
Ghir_D01G000090    0|0    Gorai.002G001100.v2.1    0|0    Ghir_A01G000080    0|0    evm.TU.Ga01G0008    0|0

统计剪切事件为0|0的基因转录本的数目

## 先将所有基因的转录本的信息统计到一个文件

 awk -F "\t"  '$3~/trans/{print $9}' ~/work/Alternative/result/Gr_result/CO41_42_result/07_annotation/D5_merge_C.gtf|awk -F ";" '{print $1}'|sed -e 's/gene_id //g' -e 's/"//g' -e 's/$/\.v2\.1/g' |sort |uniq -c |awk '{print $2"\t"$1}' >>isform_count.txt

 awk -F "\t"  '$3~/trans/{print $9}' ~/work/Alternative/result/Ga_result/CO11_12_result/07_annotation/A2_merge_C.gtf |awk -F ";" '{print $1}'|sed -e 's/gene_id //g' -e 's/"//g' |sort |uniq -c |awk '{print $2"\t"$1}' >>isform_count.txt

 awk -F "\t"  '$3~/trans/{print $9}' ~/work/Alternative/result/Gh_result/CO31_32_result/07_annotation/TM-1_merge_C.gtf |awk -F ";" '{print $1}'|sed -e 's/gene_id //g' -e 's/"//g' |sort |uniq -c |awk '{print $2"\t"$1}' >>isform_count.txt

将没有发生剪切事件,并且isform数目都是1基因做了GO富集分析

## 统计基因都为0|0的数目
awk '$2=="0|0"&&$4="0|0"&&$6=="0|0"&&$8=="0|0"{print $0}'
## 提取isform数目都为1的基因
awk '$2==1&&$4==1&&$6==1&&$8==1{print $0}' 1 >2
## GO的结果显示
很多基因都具有DNA binding的作用,表明这些基因的剪切方式比较保守,这与它们结合DNA的能够有关;因为它们必须要保持保守的结构域去识别对应的motif,发挥作用。

使用脚本去筛选出在各个基因组都保守的事件

## 在四个基因组都出现相同的IR

## 只在祖先和后代中出现的IR事件

Last updated