> For the complete documentation index, see [llms.txt](https://zpliu.gitbook.io/booknote/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://zpliu.gitbook.io/booknote/ke-bian-jian-qie/di-san-ci-dui-as-jin-hang-tong-ji/04.md).

# 04

保守的AS

* 使用wu-Blast进行对FEST进行比对
* 首先提取转录本的注释信息
* 根据AS发生的坐标提取FEST进行比对
* 筛选两端都比对上的AS

```bash
python3 ~/scripte/Alternative/module/FEST3/conserveAS/conserveAS.py -p1 ~/work/Alternative/result/Ga_result/CO11_12_result/07_annotation/merge.gtf -p2 ~/work/Alternative/result/Gh_result/CO31_32_result/07_annotation/merge.gtf  -ho ~/work/Alternative/result/homologo/homologGene/A2_vs_At_collinearity.txt  -r1 ~/work/Alternative/data/Ga_genome/G.arboreum.Chr.v1.0.gtf -r2 ~/work/Alternative/data/Ghirsutum_genome_HAU_v1.0/Ghirsutum_gene_model.gtf -AS1 ~/work/Alternative/result/Ga_result/CO11_12_result/11_AS/end_splice.txt3  -AS2 ~/work/Alternative/result/Gh_result/CO31_32_result/11_AS/end_splice.txt3  -g1 ~/work/Alternative/data/Ga_genome/G.arboreum.Chr.v1.0.fasta -g2 ~/work/Alternative/data/Ghirsutum_genome_HAU_v1.0/Ghirsutum_genome_HAU_v1.0.fasta -IR IR.txt -AltA AltA.txt -AltD AltD.txt -ES ES.txt
## 根据比对的长度，选择比对更长的
for i in `ls .`; do cat ${i} |awk '$1~/^Ghir_A/&&$2~/^Ghir_D/{print $0}$1~/^Ghir_D/&&$2~/^Ghir_A/{print $2,$1,$3}' OFS="\t"|sort |uniq |awk '{print $2,$3,$1}' OFS="\t"|sort  -k3 -k2,2nr|uniq -f2|awk '{print $3"\t"$2"\t"$1}'|sort -k3 -k2,2nr|uniq -f2  >${i}_end; done

## 统计同源基因间发生AS的数目
 for i in ExonS IntronR AltA AltD; do python ~/scripte/Alternative/module/homologASeventCount.py -homolog ~/work/Alternative/result/homologo/homologGene/D5_vs_Dt_collinearity.txt -AS1 ~/work/Alternative/result/Gr_result/CO41_42_result/11_AS/end_splice.txt3   -AS2 ~/work/Alternative/result/Gh_result/CO31_32_result/11_AS/end_splice.txt3 -T ${i} -o ${i}; tmp=`mktemp`done; sort ${i}|uniq >${tmp}; mv ${tmp} ${i}; done
```

~~这个问题之后再看看~~

```bash
###这种情况得调整一下，两个IR事件，坐标实际上只相差了一个碱基；感觉可以合并后整成一个
-    -    Gorai.001G041400-Chr01-3801948-3802062    Ghir_D07G004050-Ghir_D07-4143773-4143886
-    -    Gorai.001G041400-Chr01-3801948-3802062    Ghir_D07G004050-Ghir_D07-4143773-4143887
```

### 保守的IR

| 比较    | 基因数  | 保守事件数 | IR数1  | IR数2  |
| ----- | ---- | ----- | ----- | ----- |
| A2 At | 2098 | 4786  | 12553 | 10749 |
| D5 Dt | 1611 | 4179  | 9642  | 8865  |
| A2 D5 | 1936 | 4213  | 11552 | 10929 |
| At Dt | 1347 | 3171  | 7687  | 7491  |

### 保守ES

| 比较    | 基因数 | 保守事件数 | ES数1 | ES数2 |
| ----- | --- | ----- | ---- | ---- |
| A2 At | 129 | 137   | 217  | 214  |
| D5 Dt | 111 | 122   | 211  | 224  |
| A2 D5 | 98  | 107   | 164  | 161  |
| At Dt | 87  | 93    | 146  | 145  |

### 保守AltA

| 比较    | 基因数 | 保守事件数 | AltA数1 | AltA数2 |
| ----- | --- | ----- | ------ | ------ |
| A2 At | 285 | 324   | 679    | 678    |
| D5 Dt | 222 | 253   | 564    | 524    |
| A2 D5 | 211 | 236   | 488    | 513    |
| At Dt | 235 | 260   | 547    | 531    |

### 保守AltD

| 比较    | 基因数 | 保守事件数 | AltD数1 | AltD数2 |
| ----- | --- | ----- | ------ | ------ |
| A2 At | 229 | 251   | 494    | 487    |
| D5 Dt | 162 | 183   | 403    | 386    |
| A2 D5 | 140 | 155   | 264    | 305    |
| At Dt | 187 | 206   | 432    | 444    |

### 计算IR和ES事件的PIR值

* 比较全基因组中所有intron的PIR值
* 发生IR的IR的PIR值

![NT5WSU.png](https://s1.ax1x.com/2020/07/01/NT5WSU.png)

### 保守事件对应的长度

> TE insertion in retained introns may be a significant phenomenon in higher plants.

```bash
## 提取保守的IR事件对应的坐标
cat ../../conserveAS/A2_At/IR.txt_end ../../conserveAS/A2_D5/IR.txt_end|cut -f1 |sort |uniq |awk -F "-" '{print $2,$1,$3,$4}' OFS="\t" >conserve_IR
## 提取不保守的IR坐标
awk '$3~/IntronR/{print $1,$2,$4,$5}' OFS="\t" ../../../PSI/IR/A2_PSI.txt |cat - conserve_IR |sort -k1,2 -k2,3n|sort |uniq -u >noconserve_IR.txt
##提取Constitutive intron
+ 长度小于8000
+ PIR 小于0.1
awk '$3-$2+1<8000&&$6<0.1{print $0}' ../../../PSI/allIntron/A2_PSI.txt |awk '{a+=$3-$2+1}END{print a/NR}'
```

![NT5owR.png](https://s1.ax1x.com/2020/07/01/NT5owR.png)

## 提取保守AS的PRI值

过滤掉所以那些基因中所有内含子区域都是为\[0,0]的，这类可能是基因没有表达导致的

```bash
##没有表达的基因，从文件内过滤掉
awk '{a[$4][1]+=$5;a[$4][2]+=$6}END{for(i in a){if(a[i][1]!=0||a[i][2]!=0){print i}}}' TM1_PIR.txt |xargs  -I {} grep {} TM1_PIR.txt  >TM1_PIR2.txt
```

### 分析IR长度与PIR值的关系

* 越短的IR，它的PIR值越高

### 模拟临界PIR值

从所有的intron中抽取1000个intron，用PIR阈值进行判断，得到200个为IR，而其中200个判断的IR有多少比例的是真正的IR，每个PIR处进行1000次迭代

> 随着PIR阈值的不断增加，真实的IR比例占潜在的比例反而越来越小；

* 三代中测到了IR但是

PRI值增加，表明这个conserve AS多倍化后存在功能,`conserve_isform_conserve_IR`

```bash
cut -f1  ~/work/Alternative/result/homologo/FEST3/AS/conserveAS/A2_At/IR.txt_end|awk -F "-" '{print $2,$1,"IntronR",$3,$4}' OFS="\t"|xargs  -I {} grep {} ../IR/A2_PSI.txt >A2_conserve_IR.txt
```

### AS保守性分析

* A2与D5不保守，在四倍体中仍旧维持At与Dt不保守的状态，cDAS （conserve diversity AS）
* A2与D5保守，在四倍体中At与Dt也保守ccAS（complete conserve AS）
* A2、D5 At中存在AS，但At、Dt中只有一个存在AS BCAS（bias lost AS）
* At与Dt同时丢失，At与Dt同时获得 contemporary gain AS | contemporary lost AS
* Other

| 分类           | A基因组         | D基因组 |
| ------------ | ------------ | ---- |
| CCAS都保守      | 1013         | -    |
| BLAS 偏向性的丢失  | 1358         | 1321 |
| CDAS 保持AS的差异 | 571          | 908  |
| CGAS 同时获得AS  | 1123+445+310 | -    |
| CLAS 同时丢失AS  | 556          | -    |
| Other        | 1357+946     | -    |

### 找到所有同源基因在AS处对应的坐标

```bash
## 这种结果需要过滤一下，理论上是保守的IR，应该是取FEST的时候用300bp的差异导致的
Ghir_A01G004300-Ghir_A01-5344342-5345370    Ghir_D01G004400-Ghir_D01-4984060-4985096    y    n
Ghir_A01G004300-Ghir_A01-5344342-5345370    Ghir_D01G004400-Ghir_D01-4984060-4985095    n    y
## uniq掉
awk '{print $0"\t"$2}' A2_D5/A2_vs_D5_ASIR.txt |sort -k5,5 -k3,4r|uniq -f4|awk '{print $1,$2,$3,$4,$1}' OFS="\t" |sort -k5,5 -k3,4r|uniq -f4 >A2_D5/11111
##提取四组AS的情况
python ~/scripte/Alternative/module/homolog/identifyFourAS.py  -A2D5 ./A2_D5/11111  -AtDt ./At_Dt/11111 -A2At ./A2_At/11111  -D5Dt ./D5_Dt/11111  -o 2222
## 对应是否发生IR这个还得再用AS的数据看看
python judgeIR.py  -i A2_D5_At_Dt_AS.txt  -all all_IR.txt  -o 11
```

## 保守的AS和保守的isform

```bash
## 在鉴定的保守isoform中，同时存在多少保守的IR事件
python ~/scripte/Alternative/module/FEST3/conserveAS/isoform_ASconserve.py -As1 ~/work/Alternative/result/Ga_result/CO11_12_result/11_AS/end_splice.txt3  -As2 ~/work/Alternative/result/Gh_result/CO31_32_result/11_AS/end_splice.txt3  -isform ~/work/Alternative/result/homologo/FEST3/isforms/A2_vs_At/conserve_isform.txt -AS ../conserveAS/A2_At/IR.txt_end  -t IntronR -o 111
```

| 比较    | 保守isoform | 保守的IR |
| ----- | --------- | ----- |
| A2 At | 12868     | 481   |
| D5 Dt | 12335     | 327   |
| A2 D5 | 11823     | 263   |
| At Dt | 6331      | 191   |

### GO功能富集分析

将这些存在保守IR事件的四倍体基因提出来，进行GO富集

```bash
## At和Dt
## A2和At
## D5和Dt
```

## 特异性的AS在另外一个基因组的坐标

* 提取一个基因组的所有intron坐标

```bash
##提取一个基因组的所有intron坐标
python3 ~/scripte/extractIntronbed.py  -fasta ~/work/Alternative/data/Gr_genome/Graimondii_221_v2.0.fa -p ~/work/Alternative/result/Gr_result/CO41_42_result/07_annotation/merge.gtf  -r ~/work/Alternative/data/Gr_genome/Graimondii_221_v2.1.gene.gtf -o 111 -f 2222
##去重
sort -k1,1 -k2,3n 111 |uniq|sed '/scaffold/d' >all_intron.bed
```

* 提取特异性的AS坐标
  * 这里FEST序列就提取上下各300bp

```bash
mkdir A2_uniq
mkdir At_uniq
## 特异性AS坐标
awk '$3~/IntronR/{print $2"-"$1"-"$4"-"$5}' ~/work/Alternative/result/Ga_result/CO11_12_result/11_AS/end_splice.txt3|cat - ./IR.txt_end |cut -f1|sed 's/-[+-]$//g'|sort |uniq -u

##进行wu blast
python3 ~/scripte/Alternative/module/FEST3/conserveAS/noconserveAS.py -ho ~/work/Alternative/result/homologo/homologGene/A2_vs_At_collinearity.txt  -s IR.txt  -all ../../../../PSI/allIntron/TM1_intron.txt -g1 ~/work/Alternative/data/Ga_genome/G.arboreum.Chr.v1.0.fasta -g2 ~/work/Alternative/data/Ghirsutum_genome_HAU_v1.0/Ghirsutum_genome_HAU_v1.0.fasta -o 111
##提取最保守的
cat 111|awk '$1~/^evm/&&$2~/^Ghir_A/{print $0}$1~/^Ghir_A/&&$2~/^evm/{print $2,$1,$3}' OFS="\t"|sort |uniq |awk '{print $2,$3,$1}' OFS="\t"|sort  -k3 -k2,2nr|uniq -f2|awk '{print $3"\t"$2"\t"$1}'|sort -k3 -k2,2nr|uniq -f2 >A2_IR_uniq_At.txt
##还得再过滤一下，因为all intron里可能不包含AS的那个intron
cut -f1 A2_uniq_D5_IR.txt |xargs  -I {} grep {} ../IR.txt_end |sed 's/-[+-]//g' |cat - A2_uniq_D5_IR.txt |sort -k1,1 |awk '{print $2"\t"$3"\t"$1}' |uniq  -f2 -u |awk '{print $3"\t"$1"\t"$2}' >111
cut -f3 D5_uniq_A2_IR.txt |xargs  -I {} grep {} ../IR.txt_end |cat - D5_uniq_A2_IR.txt |sort -k3,3 |uniq -f2 -u >111
mv 111 D5_uniq_A2_IR.txt
mv 111 A2_uniq_D5_IR.txt
##还有At uniq Dtuniq中的结果取重复就是保守的IR
cat D5_uniq_A2_IR.txt  ../A2_uniq/A2_uniq_D5_IR.txt |sort |uniq -d >>../IR.txt_end
```

不保守的AS的坐标统计

```bash
```

### 不保守的AS的原因

> 1. 序列水平上的差异
> 2. 表观上的差异
>
>    sequence variation on DNA methylation analysis, we used the conserved sequences to examine DNA methylation changes between diploid and allotetraploid cottons.

### 甲基化差异与PIR差异

* 看一下怎么鉴定甲基化差异基因的方法

> 519 differentially methylated genes identified between wild and cultivated cottons
>
> 100,246 CG, 109,424 CHG, and 252,042 CHH DMRs between allotetraploid and diploid species
>
> 在A2与D5中存在的DMRs ，在四倍体中仍旧维持的有20% cDMRs
>
> A、D亚基因组在多倍化后发生反方向的变化，A到At减少，D到Dt增加；hDMRs；这有可能是染色体交叉互换的结果


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zpliu.gitbook.io/booknote/ke-bian-jian-qie/di-san-ci-dui-as-jin-hang-tong-ji/04.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
