Parameters

SNVSniffer consists of five commands: snp (a SNP caller), somatic (a somatic SNV caller from paired tumor-normal samples), gsim (an Illumina-like read simulator for SNP calling), ssim (an Illumina-like tumor-normal sample pair simulator for somatic SNV calling), and eval (a VCF-based evaluation algorithm for germine and somatic SNVs).
Command Options
snp

Usage: SNVSniffer snp [options] infile

  • infile: a mpileup/pileup/BAM file

Input:
  • -f <int> (input file format, default = 0)
    • 0: mpileup format generated by SAMtools
    • 1: pipeup format generated by MAQ
    • 2: BAM file
  • -g <string> (reference genome file, required for BAM format input)
Output:
  • -o <string> (output file name, default = STDOUT)
  • -hc_only <int> (only output high-confidence SNPs, default = 0)
Base call and coverage:
  • -min_cov <int> (minimum coverage, default = 3)
  • -max_cov <int> (maximum coverage, default = 250)
  • -min_bqual <int> (minimum Phred base quality score, default = 20)
  • -min_mapq <int> (minimum mapping quality score, default = 0)
  • -seq_err_rate <float> (sequencing error rate of the data, default = 0.02)
SNV calling:
  • -prior <int> (model used for genotype prior probabilities, default = 1)
    • 0: equal probability
    • 1: prior probabilities with no consideration of Ti/Tv ratio
    • 2: prior probabilties by considering Ti/Tv ratio
  • -snp_rate <float> (SNP mutation rate for the species, default = 0.001)
  • -min_allele_freq <int> (minimum allele frequency, default = 0.2 [Important])
  • -stringency <int> (stringency level [0, 9] for low-confidence mutations, default = 6)
  • -homo_freq <float> (allelic frequency threshold for homozygous genotype, default = 0.75)
  • -min_locus_dist <int> (minimum distance interval between two neighboring SNP loci, default = 1 [0 disables it])
  • -pvalue <float> (P value threshold for variant calling, default = 0.05)
somatic

Usage: SNVSniffer somatic [options] normal_sam_header tumor_sam_header normal tumor

  • normal_sam_header: SAM header file for normal
  • tumor_sam_header: SAM header file for tumor
  • normal: a mpileup/pileup/BAM file for normal
  • tumor: a mpileup/pileup/BAM file for tumor

Input:
  • -f <int> (input file format, default = 0)
    • 0: mpileup format generated by SAMtools
    • 1: pipeup format generated by MAQ
    • 2: BAM format
  • -g <string> (reference genome file, required for BAM format input)
Output:
  • -o <string> (output file name, default = STDOUT)
  • -o_somatic <int> (output somatic mutations, default = 1)
  • -o_loh <int> (output loss of heterzygosity mutations, default = 1)
  • -o_germline <int> (output germline mutations, default = 0)
  • -o_unknown <int> (output unknown type mutations, default = 1)
Base call and coverage:
  • -min_cov_normal <int> (minimum normal coverage, default = 3)
  • -min_cov_tumor <int> (minimum tumor coverage, default = 3)
  • -min_bqual <int> (minimum Phred base quality score, default = 20)
  • -min_mapq <int> (minimum mapping quality score, default = 0)
  • -seq_err_rate <float> (sequencing error rate of the data, default = 0.01)
SNV calling:
  • -prior <int> (model used for genotype prior probabilities, default = 1)
    • 0: equal probability
    • 1: prior probabilities with no consideration of Ti/Tv ratio
    • 2: prior probabilties by considering Ti/Tv ratio
  • -somatic_rate <float> (somatic mutation rate, default = 0.01)
  • -tumor_purity <float> (estimated purity (tumor data) of the tumor sample, default = 0 [0 means AUTO])
  • -min_allel_freq <float> (minimum allele frequency for the normal, default = 0.2 [OBSOLETE])
  • -stringency <int> (stringency level [0, 9] for low-confidence mutations, default = 6)
  • -homo_freq <float> (allelic frequency threshold for homozygous genotype, default = 0.75 [OBSOLETE])
  • -p_value <float> (P value threshold for variant calling per sample, default = 0.05)
gsim

Usage: SNVSniffer gsim [options] reference.fa reads.base

  • reference.fa: reference file stored in FASTA/FASTQ format
  • reads.base: the base file name for reads and VCF-formated mutations

Output:
  • -z (compress all of the output using ZLIB)
Reads:
  • -c <int> (coverage of the genome, default = 0 [>0 distables option -n])
  • -n <int> (number of read pairs, default = 1000000)
  • -1 <int> (length of the first read, default = 100)
  • -2 <int> (length of the second read, default = 100)
  • -a <float> (disgard if the fraction of ambiguous bases higher than #FLOAT, default = 0.05)
Germline:
  • -e <float> (sequencing base error rate, default = 0.02)
  • -i <int> (average insert size, i.e. outer distance, between the two ends, default = 500)
  • -d <int> (standard deviation of insert size, default = 50)
  • -r <float> (rate of mutations, including substitutions and indels, default = 0.001)
  • -R <float> (fraction of indels, default = 0.15)
  • -X <float> (probability an indel is extended, default = 0.3)
  • -H <float> (homozygous variant ratio, default = 0.3333)
  • -h (haplotype mode (all reads are sequenced from a single sequence)
ssim

Usage: SNVSniffer ssim [options] reference.fa reads.base

  • reference.fa: reference file stored in FASTA/FASTQ format
  • reads.base: the base file name for reads and VCF-formated mutations

Output:
  • -z (compress all of the output using ZLIB)
Reads:
  • -c <int> (coverage of the genome, default = 0 [>0 distables option -n])
  • -n <int> (number of read pairs, default = 1000000)
  • -1 <int> (length of the first read, default = 100)
  • -2 <int> (length of the second read, default = 100)
  • -a <float> (disgard if the fraction of ambiguous bases higher than #FLOAT, default = 0.05)
Somatic:
  • -s <float> (somatic mutation rate, default = 0.01)
  • -S < float> (proportion of homozygous somatic variant, default = 0.166)
  • -p <float> (probabiltiy of observing tumor reads at any somatic site [bionomial distribution], default = 0.9)
Germline:
  • -e <float> (sequencing base error rate, default = 0.02)
  • -i <int> (average insert size, i.e. outer distance, between the two ends, default = 500)
  • -d <int> (standard deviation of insert size, default = 50)
  • -r <float> (rate of mutations, including substitutions and indels, default = 0.001)
  • -R <float> (fraction of indels, default = 0.15)
  • -X <float> (probability an indel is extended, default = 0.3)
  • -H <float> (homozygous variant ratio, default = 0.3333)
eval

Usage: SNVSniffer eval trueSNPs.vcf predSNPs.vcf

  • trueSNPs.vcf: gold-standard SNPs in VCF format
  • predSNPs.vcf: predictaed SNPs in VCF format

Options:
  • -s <int> (consider VCF content as somatic mutations and classify loci to be SOMATIC, LOH, GERMLINE and Unknown, default = 0)
  • -u <int> (only consider SOMATIC, excluding LOH and Unknown, default = 0)
  • -t <int> (tumor sample precedes normal sample in the true variation VCF, default = 0)
  • -T <int> (tumor sample precedes normal sample in the predicated variation VCF, default = 0)
  • -a <int> (normal sample is missing in the true variant VCF, default = 0)
  • -b <int> (normal sample is missing in the predicated variant VCF, default = 0)

Installation and Usage

Prerequisites

  1. Linux or Unix-like operating system.
  2. SAMtools (version >= 0.1.17) must be installed and properly set to the PATH environment variable. The software is open-source and can be downloaded from http://samtools.sourceforge.net.
  3. When the input is in BAM format, the BAM file must be sorted by leftmost coordinates (e.g. command "samtools sort in.bam out.sorted").
  4. When the input is in mpileup format, the mpileup files are recommended to be generated using command "samtools mpileup -s -C 50 -f reference.fa in.bam > out.mpileup".
  5. For somatic mutation calling, the SAM header files for both normal and tumor samples can be generated using command "samtools view -H in.bam > header.sam" or "samtools view -S -H in.sam > header.sam" (subject to whether the input is in BAM format or SAM format).

Typical Usage

Command Usage
snp
  1. SNVSniffer snp infile.mpileup
  2. SNVSniffer snp -f 0 infile.mpileup
  3. SNVSniffer snp -f 2 -g reference.fa infile.sorted.bam
somatic
  1. SNVSniffer somatic -f 0 normal_header.sam tumor_header.sam normal.mpileup tumor.mpileup -o out.vcf

    Without setting "tumor_purity", SNVSniffer will automatically estimate the tumor purity from the input data.

  2. SNVSniffer -f 2 -g reference.fa somatic normal_header.sam tumor_header.sam normal.bam tumor.bam -o out.vcf
gsim
  1. SNVSniffer gsim reference.fa reads_base
  2. SNVSniffer gsim reference.fa reads_base -e 0.01 -R 0 -c 30
ssim
  1. SNVSniffer ssim reference.fa reads_base
  2. SNVSniffer ssim reference.fa reads_base -e 0.01 -R 0 -c 30
eval
  1. SNVSniffer eval gold-standard.vcf predicted_snp.vcf
  2. SNVSniffer eval gold-standard.vcf predicted_somatic.vcf -s 1
  3. SNVSniffer eval gold-standard.vcf predicted_somatic.vcf -s 1 -u 1

Important Notes