# annovar注释SNP

## 1.软件安装

### 1.1使用conda安装`gtfToGenePred`软件

给conda添加下载channels

```bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```

下载`gtfToGenePred`

```bash
conda install ucsc-gtftogenepred
conda update ucsc-gtftogenepred
```

usage

```bash
gtfToGenePred -genePredExt Gbarbadense_gene_model.gtf  Gbarbadense_gene_model.refGene.txt
```

## annovar 注释

1. 将参考基因组文件转换格式
2. `--format`指定要转换的格式
3. `--seqfile`后面接参考基因组序列文件
4. `--outfile`输出文件名

`Gbarbadense_gene_model.Pred`文件为`gtgtfToGenePred`软件生成的文件

```bash
module load annovar
retrieve_seq_from_fasta.pl --format refGene --seqfile Gbarbadense_genome_HAU_v2.0.fasta Gbarbadense_gene_model.refGene.txt --outfile Gbarbadense_refGEneMrna.fa
```

1. 将vcf文件转换为annovar格式

   > 6G的vcf文件大概跑了

```bash
convert2annovar.pl --includeinfo --allsample  --withfreq --format vcf4 ./../Gbarbadense_genome.snp.filter.recode.vcf >Gbarbadence.avinput
```

1. `table_annovar.pl`进行注释

   gtf转换后的文件和基因序列转换后的文件都要放在`Gbarbadense/`目录下

   * `--protocol`指定数据库类型
   * `--operation`注释类型 `g、r、f`分别只按照`基因、region、filter`进行注释，对应的数据库`--protocol`参数也有指明
   * `--thread`线程数
   * `--maxgenethread`当线程数超过6时，需要声明，不然最多就是6个线程在跑
   * `--outfile`输出文件前缀
   * `Gbarbadense/`文件夹中包含`Gbarbadense_gene_model.refGene.txt`文件

```bash
   table_annovar.pl --maxgenethread 10  --thread 10  Gbarbadense.avinput  Gbarbadense/ -buildver Gbarbadense --outfile Gbarbadense_annovar --protocol refGene,refGene,refGene --operation g,r,f
```

只对基因区域进行SNP的注释

```bash
   table_annovar.pl --maxgenethread 10  --thread 10  Gbarbadense.avinput  Gbarbadense/ -buildver Gbarbadense --outfile Gbarbadense_annovar --protocol refGene --operation g
```

1. 最终生成文件

   由于`--protocol`参数我用的都是`refGene`数据库类型，所以`region、fileter`模式的注释应该都有问题；没放出来

   ```bash
   ├── Gbarbadense_annovar.refGene.exonic_variant_function
   ├── Gbarbadense_annovar.refGene.invalid_input
   ├── Gbarbadense_annovar.refGene.log
   ├── Gbarbadense_annovar.refGene.variant_function
   ```

### 参考

1. 非人类<https://blog.csdn.net/u013816205/article/details/51262289>&#x20;
2. 非人类<https://blog.csdn.net/g863402758/article/details/75304391>&#x20;
3. <https://anjingwd.github.io/AnJingwd.github.io/2018/01/20/ANNOVAR%E8%BF%9B%E8%A1%8C%E7%AA%81%E5%8F%98%E6%B3%A8%E9%87%8A/>&#x20;
4. gtfTogGenePred安装  <https://bioconda.github.io/recipes/ucsc-gtftogenepred/README.html>&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zpliu.gitbook.io/booknote/sheng-xin-ruan-jian/annovar-zhu-shi-snp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
