View on GitHub

ERDS-exome

A hybrid approach for copy number variants detection from whole-exome sequencing data

Quick Start Guide

This is a step-by-step guide for using ERDS-exome. You can try ERDS-exome by using ERDS-exome with toy data which can be downloaded here. Generating RPKM matrix from BAM files could cost a lot of time. For conveience, user can begin with RPKM_matrix.raw. In this quick guide, we take sample NA12878 for example.

0. Filter RPKM matrix

Filter the RPKM matrix by GC content, mappability and exon length. For convenience, we have calculated the GC content, mappability and exon length of the targets for users. The results are stored in the files named "GC_percentage", "mapping_ability" and "exon_length" respectively. User can save time by using these files directly. User can also calculate GC content and mappability manually by specifying the "ref_file" and "map_file" parameters.
  python erds_exome.py filter \
  --rpkm_matrix RPKM_matrix.raw \
  --filter_params filter_params.txt

1. Normalization

Method 1: Normalize RPKM matrix by three steps, including mappability, GC content and exon length.
  python erds_exome.py norm_rpkm \
  --rpkm_matrix RPKM_matrix.raw.filtered
Method 2: Normalize RPKM matrix by SVD.
  python erds_exome.py svd \
  --rpkm_matrix RPKM_matrix.raw.filtered

2. CNV calling

In mode 1: CNV calling by SVD.
  python erds_exome.py discover \
  --params params.txt \
  --rpkm_matrix RPKM_matrix.raw.filtered.SVD \
  --mode pooled \
  --sample NA12878 \
  --vcf NA12878.vcf.gz \
  --hetsnp True \
  --tagsnp True \
  --tagsnp_file tagSNP_hg19.txt \
  --output NA12878.pooled.Het.Tag.cnv
In mode 2: CNV calling by baseline.
  python erds_exome.py discover \
  --params params.txt \
  --rpkm_matrix RPKM_matrix.raw.filtered.normalized \
  --mode single \
  --sample NA12878 \
  --vcf NA12878.vcf.gz \
  --hetsnp True \
  --tagsnp True \
  --tagsnp_file tagSNP_hg19.txt \
  --output NA12878.single.Het.Tag.cnv

3. Merge CNV calling results

Merge CNV calling results from different methods.
  python erds_exome.py merge \
  --datafile_svd NA12878.pooled.Het.Tag.cnv \
  --datafile_dis NA12878.single.Het.Tag.cnv \
  --output NA12878.merged.maq20.Het.Tag.cnv
The expected result file should be like this.