Various approaches to calling single-nucleotide variants (SNVs) or insertion-or-deletion (indel) mutations have been developed based on next-generation sequencing (NGS). However, most of them are dedicated to a particular type of mutation, e.g. germline SNVs in normal cells, somatic SNVs in cancer/tumor cells, or indels only. In the literature, efficient and integrated callers for both germline and somatic SNVs/indels have not yet been extensively investigated.
SNVSniffer is an efficient and integrated caller identifying both germline and somatic SNVs/indels from NGS data. In this algorithm, we propose the use of Bayesian probabilistic models to identify SNVs and investigate a multiple ungapped alignment approach to call indels. For germline variant calling, we model allele counts per site to follow a multinomial conditional distribution. For somatic variant calling, we rely on paired tumor-normal pairs from identical individuals and introduce a hybrid subtraction and joint sample analysis approach by modeling tumor-normal allele counts per site to follow a joint multinomial conditional distribution. A comprehensive performance evaluation has been conducted using a diversity of variant calling benchmarks. For germline variant calling, SNVSniffer demonstrates highly competitive accuracy with superior speed in comparison with the state-of-the-art FaSD, GATK and SAMtools. For somatic variant calling, our algorithm achieves comparable or even better accuracy, at fast speed, than the leading VarScan2, SomaticSniper, JointSNVMix2 and MuTect.
SNVSniffer consists of five components: a SNP and indel caller, a somatic variant and indel caller from normal-tumour sample pairs, an Illumina-like next-generation sequencing (NGS) read simulator of SNP calling, an Illumina-like NGS normal-tumour sample simulator for somatic mutation calling, and a VCF-based evaluation algorithm for germline and somatic SNVs.
- SNVSniffer 1.0 (latest v1.0.22)
- SNVSniffer 2.0 (latest v2.0.4) NEW
- This veresion calls both SNVs and indels for germline and somatic mutations.
- SNP calling performance of SNVSniffer 2.0 (fast mode, mode M1) on GCAT [link]
- SNP calling performance of SNVSniffer 2.0 (modest mode, mode M2) on GCAT [link]
- SNP calling performance of SNVSniffer 2.0 (accurate mode, mode M3) on GCAT [link]
- SNP calling performance of GATK v3.5 on GCAT [link]
- SNP calling performance of SAMtools v1.3 on GCAT [link]
- SNP calling performance of FaSD on GCAT [link]
- Datasets for download
- Virtual tumors
Ten virtual tumor-normal sample pairs that are generated from real human sequencing reads. Their tumor purity ranges from 0.1 to 1.0 uniformly. Meanwhile, the true somatic variants are also available in the same directory.
- Synthetic tumors from simulated data for SNVSniffer 2.0 assessment
Three tumor-normal sample pairs that are simulated from the human chromosome 20 (UCSC hg38), and their corresponding alignment files (realigned using GATK IndelRealigner).
- Virtual tumors
- Yongchao Liu, Martin Loewer, Srinivas Aluru, Bertil Schmidt: "SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations". BMC Systems Biology, 2016, 10(suppl 2): 47
- Yongchao Liu, Martin Loewer, Srinivas Aluru, Bertil Schmidt: "SNVSniffer: an integrated caller for germline and somatic SNVs based on Bayesian models". 2015 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2015), 2015, pp. 83-90
Other related papers
- Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: " CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform". Bioinformatics, 2012, 28(14): 1830-1837
- Yongchao Liu and Bertil Schmidt: "Long read alignment based on maximal exact match seeds". Bioinformatics, 2012, 28(18): i318-324
- Yongchao Liu and Bertil Schmidt: "CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing". IEEE Design & Test, 2014, 31(1): 31-39
- Yongchao Liu, Bernt Popp, and Bertil Schmidt: "CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding." PLOS ONE, 2014, 9(1):e86869
- Jorge González-Domínguez, Yongchao Liu, Bertil Schmidt: "Parallel and scalable short-read alignment on multi-core clusters using UPC++". PLoS One, 2016, 11(1): e0145490.
- March 18, 2016 (v2.0.4)
- SNVSniffer 2.0 got releases officially!
- Provide support the calling of germline and somatic indels
- Provide native support for BAM format. Users can efficiently use BAM-formatted inputs now.
- Provide three execution modes
- November 21, 2014 (v1.0.22)
- Further improved the calling accuracy
- November 05, 2014 (v1.0.19)
- Conceived a new algorithm for the run-time estimation of the purity of the tumor sample from the input datasets.
- Further improved the speed for both germine and somatic SNV calling.
- Removed the parameter "normal_purity" from this version.
- September 23, 2014 (v1.0.14)
- The first version of SNVSniffer gets released online (statically linked binary only for the present time).
If any questions or improvements, please feel free to contact Liu, Yongchao.