第三个结果
splice junction
MaxEntScan 剪切位点的评分到时候,算一下
U1 snRNP regulates chromatin retention of noncoding RNAs
提取剪切位点的二核苷酸序列
使用hisat2中
extract_splice_sites.py
脚本提取剪切位点
提取的位点中内含子和前面一个外显子(不区分正负链)之间有一个碱基的交集;
使用bedtools中
fastaFromBed
脚本提取对应的碱基序列-tab
使用tab分割的输出格式-name+
参数,保留正负链信息
awk提取对应的二核苷酸信息
负链的核酸信息需要互补一下
统计与参考基因组相比SJs的情况
基因组
Iso-Seq
参考基因组
novel SJs
gene models
TM-1
226147
332822
42138
32311
A2
144558
165885
37884
21187
D5
127760
175343
19336
19080
剪切位点信号 dinucleotide signature
> >
D5 117860 /127760
A2 131520/144558
TM-1 216906/226147
isform水平
PacBio预测的isform
对应的gene models
在参考基因组中isform的水平
基因组
Iso-Seq预测isform
每个基因平均isform数
gene
TM-1
83881
3.39907 vs 2.01395
32182
A2
69701
21038
D5
53393
4.26125 vs 2.7397
19001
类型
isform为1
isform>5
TM-1 Iso-seq
9050
7557
TM-1 ref
18780
2570
D5 Iso-seq
3487
6898
D5 ref
7107
3164
A2 iso-seq
6352
4846
A2 ref
100%
0%
Splice junctions were respectively characterized in A2、D5、TM-1. In A2 We indentified 144558 SJs from 21187 gene models defined by Iso-seq data compared with SJs(165885) in the reference annotion,and there are 37884 novel SJs. In D5 We identified 127760 SJs from 19080 gene model sdefined by Iso-Seq data compared with Sjs 175343 in the reference annotion ,and there are 19336 novel SJs. In TM-1 We identified 226147 SJs from 32311 gene models defined By Iso-seq data compared with SJs 332822 in the reference annotion and there are 42138 novel SJs. The signature of terminal dinucleotide was investiated for all SJs defined By Iso-Seq data. We found that the GT-AG type general occupied a dominant poportion of intron borders, 91.0% A2 、92.3% D5、95.9% TM1, which consistentwith the previous study in other specises.
Merged Iso-Seq annotion with reference annotion and used a customed python script to identify Alter splice(AS).
At isform level, we identified 83881 new full-length transcripts from the 32182 reference gene models in TM-1, 69701 new full-length transcripts from the 21038 reference gene models in A2, 53393 new full-length transcripts from the 19001 reference gene models. And we found 58.4%、37.4% of genes in the original annotion were defined by a single transcrip isform in TM-1 、D5. After analysis of the Iso-Seq data, only 28.1%、18.4% of the genes defined by only a single transcript.
多聚腺苷酸位点的差异
for the first exon only the donor site is described as the first position is defined as transcription start site. Likewise, the last exon does not contain a donor splice site as the position is defined as polyadenylation site
寻找motif
最小width 5 最长20
找15个
ployA统计
棉种
基因数
转录本数
大于5个的基因数
per gene
TM1
32182
78631
2394 At1172/Dt1222
2.44
A2
21038
64620
2786
3.07
D5
19001
50312
1728
2.65
Divergent structure of splicing isforms in Gossypium lineage
棉花在多倍化过程中isform的差异,计算被Iso-seq检测到isform的基因对应的isform数目。然后与同源基因对取交集看哪些有被检测到
统计同源基因的Iso-Seq检测到的isform数目的差异
脚本 homolog_isform_count.py
类型
同源基因对
被检测到的同源基因对
A2 vs At
26099
18068
D5 vs Dt
27373
18445
考虑两个基因组共同驻留在一个细胞内,考虑四个同源基因的情况
Last updated