表观数据分析

转座子数据

TEs外显子化

Alternative splicing and co-option of transposable elements: the case of TMPO/LAP2α and ZNF451 in mammals

Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene

Transposable Elements in Cancer and Other Human Diseases

Comprehensive profiling of rhizome‐associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis)

在内含子区域的比例,在AS事件,对应Intron中TE的比例

提取所有intron的bed文件

 ## 提取所有intron坐标
 python ~/software/hisat2-2.1.0/extract_splice_sites.py  ~/work/Alternative/result/Ga_result/CO11_12_result/06_Alignment/all.collapsed.gtf >1
 awk '{print $1,$2+2,$3,$4}' OFS="\t" 1 >all_intron.bed

 ## 提取AS对应事件的Intron坐标
 cp ../../Position/IR/A2_all.bed  >IR.bed

 ## 取交集

Circos图数据

  • TEs loci

  • gene ref isform 位置

  • gene ref loci

  • gene PacBio loci

each chromosome was divided into 1 Mb bins sliding 200 kb.

# 制作window bin
#Ghir_A01    0    117710661    Ghir_A01
#Ghir_A02    0    108049532    Ghir_A02
#Ghir_A03    0    113014280    Ghir_A03
#Ghir_A04    0    85114396    Ghir_A04
~/software/bedtools2-2.29.0/bin/windowMaker  -i srcwinnum -b chromosome.bed -w 1000000 -s 200000 >chromosome_window.bed

## 提取gene loci
awk '$3~/gene/{print $1"\t"$4"\t"$5}' ~/work/Alternative/data/Ghirsutum_genome_HAU_v1.0/Ghirsutum_gene_model.gff3 >gene.bed
## 提取TE loci
cut -f1,4,5  ~/work/Alternative/data/Ghirsutum_genome_HAU_v1.0/Ghirsutum_genome_repeat.gff3 
## 提取 ref isform loci
awk '$3~/mRNA/{print $1"\t"$4"\t"$5}' ~/work/Alternative/data/Ghirsutum_genome_HAU_v1.0/Ghirsutum_gene_model.gff3 >ref_isform.bed
## 提取PacBio loci
awk '$3~/transcript/{print $1"\t"$4"\t"$5}' ~/work/Alternative/result/Gh_result/CO31_32_result/07_annotation/isoseq.info.gtf >PacBio_isform.bed

统计每个区域的覆盖度,而不是数目

## 与windo取交集
~/software/bedtools2-2.29.0/bin/intersectBed -loj -a ../chromosome_window.bed -b gene.bed >../interect/gene.bed

##统计每个bin的覆盖度
awk '$5!="."{a[$4][1]+=$7-$6+1;a[$4][2]=$1;a[$4][3]=$2;a[$4][4]=$3;}$5=="."{a[$4][1]+=0;a[$4][2]=$1;a[$4][3]=$2;a[$4][4]=$3;}END{for(i in a){printf a[i][2]"\t"a[i][3]"\t"a[i][4]"\t"a[i][1]"\n"}}' gene.bed >gene_circos.txt

制作染色体文件

awk '{print "chr\t-\t"$1"\t"substr($1,6)"\t"$2"\t"$3}' chromosome.bed|cut -f1-6  |awk '{print $0"\tmyChr"substr($3,7)}' >chromosome_circos.txt
circos分布

甲基化数据

提交文件计算甲基化程度

#提交任务
bsub -q high -M 100G  -e methlation.err -o methlation.out -R span[hosts=1] -n 1 "bash ./changeTobed.sh /data/cotton/zhenpingliu/DNA_methlation_rawData/TM-1/CpG_context_TM1_qs_rep2_count.txt_binom.txt_fdr.bed ../constitutive_intron.bed constitutive_CpG_rep2.txt"

换另外一种计算方法

  • 甲基化的C/区域内总的CG数*2

Last updated