对同源基因的剪切事件进行分类.md
将各个剪切事件在所有棉种中进行汇总
内含子保留事件
awk '$3~/IntronR/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >intronR_count.txt
awk '$3~/IntronR/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>intronR_count.txt
awk '$3~/IntronR/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>intronR_count.txt
外显子跳跃
awk '$3~/ExonS/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>exonSkip_count.txt
awk '$3~/ExonS/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>exonSkip_count.txt
awk '$3~/ExonS/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>exonSkip_count.txt
AltA可变的5'
awk '$3~/AltA/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltA_count.txt
awk '$3~/AltA/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltA_count.txt
awk '$3~/AltA/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltA_count.txt
AltD可变的3’
awk '$3~/AltD/{print $2}' ../TM-1/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltD_count.txt
awk '$3~/AltD/{print $2}' ../D5/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltD_count.txt
awk '$3~/AltD/{print $2}' ../A2/end_third |sort |uniq -c |awk '{print $2"\t"$1}' >>AltD_count.txt
根据被剪切下来的片段长度来找保守的剪切事件
先统计同源基因发生IR事件时的长度与ExonS的长度
得到结果如下:
## 每个基因的IR长度与ExonS长度
Ghir_D01G000050 84,101,352,1233,1507,1622|0 Gorai.002G000700.v2.1 87,100,101,316,1510|0 Ghir_A01G000050 0|0 evm.TU.Ga01G0005 0|0
Ghir_D01G000080 0|0 Gorai.002G001000.v2.1 0|0 Ghir_A01G000060 0|0 evm.TU.Ga01G0007 0|0
Ghir_D01G000090 0|0 Gorai.002G001100.v2.1 0|0 Ghir_A01G000080 0|0 evm.TU.Ga01G0008 0|0
统计剪切事件为0|0
的基因转录本的数目
0|0
的基因转录本的数目## 先将所有基因的转录本的信息统计到一个文件
awk -F "\t" '$3~/trans/{print $9}' ~/work/Alternative/result/Gr_result/CO41_42_result/07_annotation/D5_merge_C.gtf|awk -F ";" '{print $1}'|sed -e 's/gene_id //g' -e 's/"//g' -e 's/$/\.v2\.1/g' |sort |uniq -c |awk '{print $2"\t"$1}' >>isform_count.txt
awk -F "\t" '$3~/trans/{print $9}' ~/work/Alternative/result/Ga_result/CO11_12_result/07_annotation/A2_merge_C.gtf |awk -F ";" '{print $1}'|sed -e 's/gene_id //g' -e 's/"//g' |sort |uniq -c |awk '{print $2"\t"$1}' >>isform_count.txt
awk -F "\t" '$3~/trans/{print $9}' ~/work/Alternative/result/Gh_result/CO31_32_result/07_annotation/TM-1_merge_C.gtf |awk -F ";" '{print $1}'|sed -e 's/gene_id //g' -e 's/"//g' |sort |uniq -c |awk '{print $2"\t"$1}' >>isform_count.txt
将没有发生剪切事件,并且isform数目都是1基因做了GO富集分析
## 统计基因都为0|0的数目
awk '$2=="0|0"&&$4="0|0"&&$6=="0|0"&&$8=="0|0"{print $0}'
## 提取isform数目都为1的基因
awk '$2==1&&$4==1&&$6==1&&$8==1{print $0}' 1 >2
## GO的结果显示
很多基因都具有DNA binding的作用,表明这些基因的剪切方式比较保守,这与它们结合DNA的能够有关;因为它们必须要保持保守的结构域去识别对应的motif,发挥作用。
使用脚本去筛选出在各个基因组都保守的事件
## 在四个基因组都出现相同的IR
## 只在祖先和后代中出现的IR事件
Last updated