# 01编辑效率检测

## 环境搭建

删除环境

`conda remove -n py36 --all`

安装环境

`conda create -n py36 python=3.6`

## 报错

* `ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject`

  原因是因为numpy版本太低了，卸载了安装旧的包;更新numpy包

  conda update numpy=1.16.1\`
* `pkg_resources.DistributionNotFound: The 'kiwisolver>=1.0.1' distribution was not found and is required by matplotlib`

  `conda install kiwisolver=1.1.0`

## 起步

* 1.数据准备
  * 双端测序数据文件 R1.fastq R2.fastq
  * 引物文件，均为5到3方向，名称必须按照此格式**TAB**键分割

    ```bash
    A1-F  gcttGCGTtggagtgagtacggtgtgcAAAGTATGCCCCTTATGGACCCT
    A1-R  ctgtGCGTtgagttggatgctggatggTCCTCATCCCATTGTTCATTTCT
    # windows传进服务器会多一个\r字符,需要删除
    sed -i 's/\r//g' 文件名
    ```
  * 靶标文件和对应的sgRNA,**TAB**键分割

    ```bash
    A1    AAAGTATGCCCCTTATGGACCCTCTTGTGACATTATTTGGTAGTGTTCATGAAAAGCTTCCCGAGACAGGGAGCACGCGTAGTATGCTTTTTCCGAATTTTGGAAGCATGTTTAGTACAGCAGAGCCTCATGCTAGAAATGAACAATGGGATGAGGA AGACAGGGAGCACGCGTAGT  NA  NA
    # windows传瑾服务器会多一个\r字符,需要删除
    sed -i 's/\r//g' 文件名
    ```

    * 懒得打NA的情况

      ```bash
      # 输入文件格式
      A1    参考序列    sgRNA
      # 运行命令
      sed -i  's/$/\tNA\tNA/g'  靶标序列文件 
      # 命令直接对原文件进行了修改
      A1    参考序列    sgRNA    NA    NA
      ```

```
列与列之间Tab键分隔；第二和第三个文件中对应样品名称必须相同；
```

* 2.合并双端测序数据

  ```bash
  # 进入conda环境
  conda activate CRISPResso2
  # 合并的命令
  flash -t 1 R1.fastq.gz R2.fastq.gz 2>&1 | tee flash.log
  # 输出文件会在R1和R2文件所在的目录，如下所示
  .
  ├── flash.log
  ├── hi-tom-1-sml_BKDL190837910-1a_1.fq.gz
  ├── hi-tom-1-sml_BKDL190837910-1a_2.fq.gz
  ├── MD5_hi-tom-1-sml_BKDL190837910-1a.txt
  ├── out.extendedFrags.fastq  # 下一步分析用到的文件
  ├── out.hist
  ├── out.histogram
  ├── out.notCombined_1.fastq
  └── out.notCombined_2.fastq
  ```
* 3.反向互补R引物

  ```bash
  # prefix变量为引物文件和靶标序列文件所在目录的名称
  prefix="T1"
  # 进入引物文件和靶标序列文件所在目录
  cd /public/home/zpxu/djw/${prefix}
  # 分别修改引物文件和靶标序列文件的文件名
  mv 你的引物文件  ${prefix}.primer.txt
  mv 你的靶标文件  ${prefix}.txt
  ##############
  cat ${prefix}.primer.txt | sed 'N;s/\n/ /' | sed "s/-F//g" | awk '{print $1"\t"$2 > $1"-F.txt"}'
  ###################
  cat ${prefix}.primer.txt | sed 'N;s/\n/ /' | sed "s/-R//g" | awk '{print ">"$3"\n"$4 > $3"-R.txt"}'
  ###############
  for i in `ls | \grep "R" | sed "s/-R\.txt//g" | sort -V | uniq | xargs`
  do
  perl -nle'BEGIN {
      @map{ A, a, C, c, G, g, T, t } = ( T, t, G, g, C, c, A, a )
      }
      print /^>/ ?
          $_ :
                join //, map $map{ $_ }, split //, scalar reverse
      ' ${i}-R.txt | awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0"\t":$0 }' | sed "s/^>//g" > ${i}-RC.txt
  cat ${i}-F.txt ${i}-RC.txt | sed 'N;s/\n/ /' | awk '{print $1,$2,$4}' | awk '{print $2".*"$3 > $1".txt"}'
  \rm ${i}-F.txt ${i}-R.txt ${i}-RC.txt
  done
  ```
* 4.按barcode拆分reads

  ```bash
  for i in `ls | \grep -v "${prefix}" | sort -V | uniq | xargs`
  do
  for j in `cat ${i}`
  do
  cat ../out.extendedFrags.fastq | \grep -B1 -A2 -i "${j}" | grep -v "^--$" >> ${i%.*}.fastq
  gzip ${i%.*}.fastq
  done
  done
  ```
* 5.按样品拆分扩增区域和靶标文件

  ```bash
  cat ${prefix}.txt | awk -F"\t" 'BEGIN{OFS="\t"} {print "-a "$2" -g "$3 > $1"_description.txt"}'
  ```
* 6.检测编辑情况

  `vim barcode.lsf`

  编写lsf作业，使用vim命令,将下面的代码复制到barcode.lsf文件中

  **文件路径不能包含 ‘-’**&#x20;

  ```bash
  #BSUB -q smp
  #BSUB -W 168:00
  #BSUB -n 1
  #BSUB -R span[hosts=1]
  #BSUB -o T1_cas9.out
  #BSUB -e T1_cas9.err
  #BSUB -J T1_cas9
  conda init bash
  conda activate CRISPResso2
  # 进入引物文件和靶标序列文件所在目录
  cd /public/home/zpxu/djw/T1
  ############## Analysis #####################
  for i in `ls | \grep "fastq.gz" | sed "s/\.fastq\.gz//g" | sort -V | xargs`
  do
  echo CRISPResso -r1 ${i}.fastq.gz | cat - ${i}_description.txt | sed 'N;s/\n/ /' > ${i}.sh
  bash ${i}.sh
  mv ${i}.sh ./CRISPResso_on_${i}
  mv CRISPResso_on_${i}.html ./CRISPResso_on_${i}
  tar -Ipigz -cf CRISPResso_on_${i}.tar.gz CRISPResso_on_${i}
  done
  echo "Finally! For more see http://crispresso.pinellolab.partners.org/static/demo/CRISPResso_on_hdr/CRISPResso2_report.html"
  ```
* 提交任务

  bsub \<barcode.lsf\`

## 输出结果文件目录

```bash
.
├── 1a.Read_barplot.pdf
├── 1a.Read_barplot.png
├── 1b.Alignment_pie_chart.pdf
├── 1b.Alignment_pie_chart.png
├── 1c.Alignment_barplot.pdf
├── 1c.Alignment_barplot.png
├── 2a.Nucleotide_percentage_quilt.pdf
├── 2a.Nucleotide_percentage_quilt.png
├── 2b.Nucleotide_percentage_quilt_around_sgRNA_AGTTGGATCTACAGCCAACG.pdf
├── 2b.Nucleotide_percentage_quilt_around_sgRNA_AGTTGGATCTACAGCCAACG.png
├── 3a.Indel_size_distribution.pdf
├── 3a.Indel_size_distribution.png
├── 3b.Insertion_deletion_substitutions_size_hist.pdf
├── 3b.Insertion_deletion_substitutions_size_hist.png
├── 4a.Combined_insertion_deletion_substitution_locations.pdf
├── 4a.Combined_insertion_deletion_substitution_locations.png
├── 4b.Insertion_deletion_substitution_locations.pdf
├── 4b.Insertion_deletion_substitution_locations.png
├── 4c.Quantification_window_insertion_deletion_substitution_locations.pdf
├── 4c.Quantification_window_insertion_deletion_substitution_locations.png
├── 4d.Position_dependent_average_indel_size.pdf
├── 4d.Position_dependent_average_indel_size.png
├── 9.Alleles_frequency_table_around_sgRNA_AGTTGGATCTACAGCCAACG.pdf
├── 9.Alleles_frequency_table_around_sgRNA_AGTTGGATCTACAGCCAACG.png
├── A2.sh
├── Alleles_frequency_table_around_sgRNA_AGTTGGATCTACAGCCAACG.txt
├── Alleles_frequency_table.zip
├── CRISPResso2_info.pickle
├── CRISPResso_mapping_statistics.txt
├── CRISPResso_on_A2.html
├── CRISPResso_quantification_of_editing_frequency.txt
├── CRISPResso_RUNNING_LOG.txt
├── Deletion_histogram.txt
├── Effect_vector_combined.txt
├── Effect_vector_deletion.txt
├── Effect_vector_insertion.txt
├── Effect_vector_substitution.txt
├── Indel_histogram.txt
├── Insertion_histogram.txt
├── Modification_count_vectors.txt
├── Nucleotide_frequency_table.txt
├── Nucleotide_percentage_table.txt
├── Quantification_window_modification_count_vectors.txt
├── Quantification_window_nucleotide_frequency_table.txt
├── Quantification_window_nucleotide_percentage_table.txt
├── Quantification_window_substitution_frequency_table.txt
├── Substitution_frequency_table.txt
└── Substitution_histogram.txt
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zpliu.gitbook.io/booknote/crisp-case9/01-bian-ji-xiao-shuai-jian-ce.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
