Parameters

SNVSniffer consists of five commands: snp (a SNP/indel caller), somatic (a somatic SNV/indel caller from paired tumor-normal samples), gsim (an Illumina-like read simulator for SNP/indel calling), ssim (an Illumina-like tumor-normal sample pair simulator for somatic SNV/indel calling), and eval (a VCF-based evaluation algorithm for germine and somatic SNVs/indels).
Command Options
snp Identify single-sample SNVs/indels
Usage: SNVSniffer snp [options] sam_header infile
  • sam_header: a SAM header file associated with the BAM file
  • infile: a mpileup/pileup/BAM file (can have no header)
Input:
  • -g <string> (reference genome file, required for BAM format input)
  • -f <int> (input file format, default = 2)
    • 0: mpileup format generated by SAMtools
    • 1: pipeup format generated by MAQ
    • 2: BAM file
Output:
  • -o <string> (output file name, default = STDOUT)
Base call and coverage:
  • -min_cov <int> (minimum coverage, default = 3)
  • -max_cov <int> (maximum coverage, default = 250)
  • -min_bqual <int> (minimum Phred base quality score, default = 20)
  • -min_mapq <int> (minimum mapping quality score, default = 0)
  • -seq_err_rate <float> (sequencing error rate of the data, default = 0.02)
SNV calling:
  • -exe_mode < int> (execution model for SNV calling, default = 0)
    • 0: very fast and lesst accurate
    • 1: fast and accurate
    • 2: slow and most accurate
  • -prior <int> (model used for genotype prior probabilities, default = 1)
    • 0: equal probability
    • 1: prior probabilities with no consideration of Ti/Tv ratio
    • 2: prior probabilties by considering Ti/Tv ratio
  • -snp_rate <float> (SNP mutation rate for the species, default = 0.001)
  • -min_allele_freq <int> (minimum allele frequency, default = 0.2 [Important])
  • -stringency <int> (stringency level [0, 9] for low-confidence mutations, default = 6)
  • -homo_freq <float> (allelic frequency threshold for homozygous genotype, default = 0.75)
  • -min_locus_dist <int> (minimum distance interval between two neighboring SNP loci, default = 1 [0 disables it])
  • -use_strand_dist < bool> (use strand distribution information, default = 0)
  • -local_range < int> (set the local search range for strand distribution, default = 1000)
  • -call_snps < bool> (enable the calling of SNPs, default = 1)
  • -call_indel < bool> (enable the calling of insertions or deletions, default = 1)
  • -pvalue <float> (P value threshold for variant calling, default = 0.05)
somatic Identify germline/somatic SNVs/indels from tumor-normal pairs
Usage: SNVSniffer somatic [options] normal_sam_header tumor_sam_header normal tumor
  • normal_sam_header: SAM header file for normal
  • tumor_sam_header: SAM header file for tumor
  • normal: a mpileup/pileup/BAM file for normal
  • tumor: a mpileup/pileup/BAM file for tumor
Input:
  • -g <string> (reference genome file, required for BAM format input)
  • -f <int> (input file format, default = 2)
    • 0: mpileup format generated by SAMtools
    • 1: pipeup format generated by MAQ
    • 2: BAM format
Output:
  • -o <string> (output file name, default = STDOUT)
  • -o_somatic <int> (output somatic mutations, default = 1)
  • -o_loh <int> (output loss of heterzygosity mutations, default = 1)
  • -o_germline <int> (output germline mutations, default = 0)
  • -o_unknown <int> (output unknown type mutations, default = 1)
  • -o_indel < int> (output indels, default = 1)
Base call and coverage:
  • -min_cov_normal <int> (minimum normal coverage, default = 3)
  • -min_cov_tumor <int> (minimum tumor coverage, default = 3)
  • -min_bqual <int> (minimum Phred base quality score, default = 20)
  • -min_mapq <int> (minimum mapping quality score, default = 0)
  • -seq_err_rate <float> (sequencing error rate of the data, default = 0.01)
SNV calling:
  • -exe_mode < int> (execution model for SNV calling, default = 0)
    • 0: very fast
    • 1: fast and more accurate
    • 2: slow and most accurate
  • -prior <int> (model used for genotype prior probabilities, default = 1)
    • 0: equal probability
    • 1: prior probabilities with no consideration of Ti/Tv ratio
    • 2: prior probabilties by considering Ti/Tv ratio
  • -somatic_rate <float> (somatic mutation rate, default = 0.01)
  • -tumor_purity <float> (estimated purity (tumor data) of the tumor sample, default = 0 [0 means AUTO])
  • -min_allel_freq <float> (minimum allele frequency for the normal, default = 0.2 [OBSOLETE])
  • -stringency <int> (stringency level [0, 9] for low-confidence mutations, default = 6)
  • -homo_freq <float> (allelic frequency threshold for homozygous genotype, default = 0.75 [OBSOLETE])
  • -p_value <float> (P value threshold for variant calling per sample, default = 0.05)
  • -use_strand_dist < bool> (use strand distribution information, default = 0)
  • -local_range < int> (set the local search range for strand distribution, default = 1000)
gsim Simulate sample reads with germline SNVs/indels
Usage: SNVSniffer gsim [options] reference.fa reads.base
  • reference.fa: reference file stored in FASTA/FASTQ format
  • reads.base: the base file name for reads and VCF-formated mutations
Output:
  • -z (compress all of the output using ZLIB)
Random numbers:
  • -p < int> (random number seed value for generating the simulated genome, default=11)
  • -q < int> (random number seed value for generating simulated reads from the simulated genome, default = 47)
Reads:
  • -c <int> (coverage of the genome, default = 0 [>0 distables option -n])
  • -n <int> (number of read pairs, default = 1000000)
  • -1 <int> (length of the first read, default = 100)
  • -2 <int> (length of the second read, default = 100)
  • -a <float> (disgard if the fraction of ambiguous bases higher than #FLOAT, default = 0.05)
Germline:
  • -e <float> (sequencing base error rate, default = 0.02)
  • -i <int> (average insert size, i.e. outer distance, between the two ends, default = 500)
  • -d <int> (standard deviation of insert size, default = 50)
  • -r <float> (rate of mutations, including substitutions and indels, default = 0.001)
  • -R <float> (fraction of indels, default = 0.15)
  • -X <float> (probability an indel is extended, default = 0.3)
  • -H <float> (homozygous variant ratio, default = 0.3333)
  • -h (haplotype mode (all reads are sequenced from a single sequence)
ssim Simulate normal and tumor sample reads with somatic SNVs/indels
Usage: SNVSniffer ssim [options] reference.fa reads.base
  • reference.fa: reference file stored in FASTA/FASTQ format
  • reads.base: the base file name for reads and VCF-formated mutations
Output:
  • -z (compress all of the output using ZLIB)
Random numbers:
  • u < int> (random number seed value for the simulated genome, default=11)
  • v < int> (random number seed value for generating somatic mutations on the simulated gneome, default=149)
  • w < int> (random number seed value for generating reads from the simulated tumor genomes, default=97)
Reads:
  • -c <int> (coverage of the genome, default = 0 [>0 distables option -n])
  • -n <int> (number of read pairs, default = 1000000)
  • -1 <int> (length of the first read, default = 100)
  • -2 <int> (length of the second read, default = 100)
  • -a <float> (disgard if the fraction of ambiguous bases higher than #FLOAT, default = 0.05)
Somatic:
  • -E < float> (sequencing base error rate from simulated tumor genome, default = 0.02) -s < float> (somatic mutation rate, default = 0.01) -t < float> (ratio of homozygous variants (the rest are heterozygous), default = 0.3333) -p < float> (probabiltiy of observing tumor reads at any somatic site [bionomial distribution], default = 0.9) -q < float> (fraction of somatic indels, default=0.15) -x < float> (probability an somatic indel is extended, default = 0.3) -h < float> (homozygous rate for somatic variants, default = 0.3333)
Germline:
  • -e <float> (sequencing base error rate, default = 0.02)
  • -i <int> (average insert size, i.e. outer distance, between the two ends, default = 500)
  • -d <int> (standard deviation of insert size, default = 50)
  • -r <float> (rate of mutations, including substitutions and indels, default = 0.001)
  • -R <float> (fraction of indels, default = 0.15)
  • -X <float> (probability an indel is extended, default = 0.3)
  • -H <float> (homozygous variant ratio, default = 0.3333)
eval Evaluate predicted germline/somatic SNVs/indels against gold standard, only for evaluation purpose
Usage: SNVSniffer eval trueSNPs.vcf predSNPs.vcf
  • trueSNPs.vcf: gold-standard SNPs in VCF format
  • predSNPs.vcf: predictaed SNPs in VCF format
Options:
  • -s <int> (consider VCF content as somatic mutations and classify loci to be SOMATIC, LOH, GERMLINE and Unknown, default = 0)
  • -u <int> (only consider SOMATIC, excluding LOH and Unknown, default = 0)
  • -t <int> (tumor sample precedes normal sample in the true variation VCF, default = 0)
  • -T <int> (tumor sample precedes normal sample in the predicated variation VCF, default = 0)
  • -a <int> (normal sample is missing in the true variant VCF, default = 0)
  • -b <int> (normal sample is missing in the predicated variant VCF, default = 0)

Installation and Usage

Prerequisites

  1. Linux or Unix-like operating system.
  2. SNVSniffer 2.0 provides native support for BAM format. When the input is in BAM, please make sure that the BAM file has already been sorted by leftmost coordinates relative to the reference (command "samtools sort in.bam -o sorted.bam") and has also been indexed (command "smatools index sorted.bam"). In addtion, do not forget to specify the correct reference genome.
  3. For SNP/indel calling, the SAM header file must be provided for consistency check with the input BAM/mpileup/pileup file. This SAM header file is also helpful for the case when the input BAM file does not have a header. This is often the case since users frequenlty forget to keep the header when converting from SAM to BAM using SAMtools. Given a BAM file, we can get its SAM header by running the command "samtools view -H in.bam > header.sam". Otherwise, given a SAM file, the header can be accordingly got by running the command "samtools view -S -H in.sam > header.sam".

Typical Usage

snp
  1. SNVSniffer snp -f 2 header.sam -g reference.fa infile.bam
  2. SNVSniffer snp -f 0 header.sam infile.mpileup
somatic
  1. SNVSniffer -f 2 -g reference.fa somatic normal_header.sam tumor_header.sam normal.bam tumor.bam -o out.vcf

    Without setting "tumor_purity", SNVSniffer will automatically estimate the tumor purity from the input data.

  2. SNVSniffer somatic -f 0 normal_header.sam tumor_header.sam normal.mpileup tumor.mpileup -o out.vcf
gsim
  1. SNVSniffer gsim reference.fa reads_base
  2. SNVSniffer gsim reference.fa reads_base -e 0.01 -c 30
ssim
  1. SNVSniffer ssim reference.fa reads_base
  2. SNVSniffer ssim reference.fa reads_base -e 0.01 -c 30
eval
  1. SNVSniffer eval gold-standard.vcf predicted_snp.vcf
  2. SNVSniffer eval gold-standard.vcf predicted_somatic.vcf -s 1
  3. SNVSniffer eval gold-standard.vcf predicted_somatic.vcf -s 1 -u 1

Important Notes