Skip to main content
G2P Knowledge Centre logo
Login or use OpenID
Need an account? Contact us
GEN2PHEN logo
  • Home
  • News
  • Events
  • Community
  • Data
  • About GEN2PHEN
Home » Groups » Functional Prediction

Tools and methods for mapping genomic structural variation

  • View
  • Revisions
Contributed by: Administrator
Originally posted: 11th September 2009: 7:01 am
Last updated: 6th October 2010: 3:03 pm
Short URL: https://gen2phen.org/node/5918
Interest group iconFunctional Prediction
Public documentPublic - anyone can view
Tweet
  

This page concentrates on tools and methods for mapping genomic structural variation. Specific to new sequencing techniques is the unprecedented speed and short read lenghts. The new tools mapping the genomic structural variation are design to handle the output from these analysis and map the location of genomic structural variants based on this information. Listing these as a disease prediction tools is based on the fact that all structural variants are very potential risk factors for pathogenicity.

 

Contents
  • Programs available
  • Description of programs
  • Useful databases
  • Data formats and standards
  • Review articles
  • Other articles
  • Useful links

 

Programs available

  • BreakDancer
  • indelign 
  • MAQ
  • VariationHunter
  • MoDIL
  • PEMer
  • cnvHMM
  • GASV
  • SVA
  • SWT
  • VarScan
  • Pindel
  • Method in article Lee et al
  • CNV-seq

 


 

Description of programs

 

BreakDancer

BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and  translocations. BrakDancer software package consist of two complementary algorithms:BreakDancerMax and BreakDancerMini. BreakDancerMini uses Kolmogorov-Smirnov test as a mapping algorithm. As an input programs require map files produced by MAQ. As an output the program reports structural variants: BreakDancerMax reports deletions, insertions, inversions, and intra and interchromosomal translocations and BreakDancerMini small indels.

website:
Download: from Nature Methods web site or from Sourceforge site
Reference: Chen et al 2009 BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.Nat.Methods, 2009, 6, 9, 677-681.doi:10.1038/NMETH.1363

 

 

indelingn

Indelign is a probabilistic framework for annotation of insertions and deletions in a multiple alignment. 

Download: here
Reference: Kim et al.  Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment. Bioinformatics. 2006 Nov 15. doi: 10.1093/bioinformatics/btl578

 

MAQ

MAQ is both a tool from mapping short DNA sequencing reads and for identification of small-size indels (<10 base pairs). MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. As an input MAQ takes sequence reads with mate-pair information.As an output it generates mapping of reads and in addition detected short indels

website:
Download: from Sourceforge site
Reference: Li et al. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 2008, 18, 11, 1851-1858. doi:10.1101/gr.0788212.108
See also: MAQGene ( Web-based user interface for MAQ)


Back to the top

 

VariationHunter

VariationHunter is a package of programs need to find structural variations which mappings of paired-end reads are known. VariationHunter uses MrFast as mapping algorithm. As an input it needs mappings of pair-end sequenced reads  plus some additional information related to them.  Output containing information about structural variants is given in three files: deletions, insertions and inversions each in their own file.  Method is used to identify indels larger than 50 bp (Lee et al).

website:
Download:Source code
Reference: Hormozdiari et al. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res., 2009, 19, 7, 1270-1278. doi:10.1101/gr.088633.108

 

MoDIL

MoDIL says to be the first method to identify medium size (20-50 bp) indels from high-throughput sequencing data while there exist several methods identificating small and large indels. As an input MoDIL takes sequence reads. MoDIL uses EM algorithm  and Kolmogorov-Smirnov test while doing the analysis. As an output program gives identified indels.

website:MoDIL
Download:
Reference: Lee et al. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nat.Methods, 2009, 6, 7, 473-474. doi:10.1038/NMETH.F.256

 

Back to the top

 

PEMer (Paired-End Mapper)

PEMER consist of analysis pipeline,simulation-based error models and a back-end database. Tool is used to identify indels larger than 50 bp (Lee et al). Method should be relatively insensitive to base-calling errors. PEMer can process the data from several next-generation DNA sequencing platforms including 454 (Roche), Illumina and ABI. Back-end databases, BreakDB, is a web accessible database developed to store, annotate and dsplay SV breakpoint events identified by PEMer and from other sources.

website:PEMer
Download: here
Reference: Korbel et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol., 2009, 10, 2, R23.   doi: 10.1186/gb-2009-10-2-r23.

 

cnvHMM

cnvHMM is a Washington University algorithm for Illumina and Solexa data. cnvHMM does copy number analysis using hidden markov algorithm.

website:cnvHMM
Download:sources
Reference:

 

 

GASV (Geometric Analysis of Structural Variants)

GASV is a software for classification and comparison of strutural variants measured via paired-end sequencing and/or array-CGH. GASV currently supports three features: clustering a set of ESP's and producing breakpoint regions, filtering paired-end sequences (ESP) by a reference set, and taking a set of ESP's and producing unclustered breakpoint regions.

website: GASV
Download: here
Reference: Sindi et al. A geometric approach for classification and comparison of structural variants. Bioinformatics, 2009, 25, 12, i222-30. doi:10.1093/bioinformatics/btp208

 

Back to the top

 

SVA

Sequence Variant Analyzer (SVA)  is a tool developed to analyze genetic variants from whole-genome sequencing studies. As an input tool takes single site variants , small indels and large copy number changes. SVA uses a number of biological databases to perform the functional annotation, and then, implements several internal and external programs to perform the statistical and bioinformatical analyses for identifying potential causal variants and genes responsible for the biological traits or medical outcomes of interest.

website: SVA
Download: here
Reference: If you use the tool, please use the following citation: 
Software: SequenceVariantAnalyzer
Author: Dongliang Ge & David B. Goldstein
URL: http://www.svaproject.org/

 

 

SWT

SWT is  a WashU Sliding Window Tool for detecting copy number variants from Illumina/Solexa data.

website: SWT
Download: source
Reference:

 

VarScan

Many tools can handle the output of just one technology, VarScan is able to detect SNPs and indels from both Solexa and Roche platforms. Unlike currently available variant detection tools, VarScan is compatible with several read aligners (BLAT, Newbler, cross_match, Bowtie and Novoalign) and calls variants in both individual and pooled samples.  As input VarScan requires an alignment file. As output user gets  report of SNPs, insertions and  deletions with their chromosomal coordinates, alleles, flanking sequence and read counts. VariantScan does not predict the effect of these variants just their existence.

website:
Download: (Download VarScan from here)
Reference: Kobolt et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 2009, 25, 17, 2283-2285.  doi:10.1093/bioinformatics/btp373

 

Back to the top

 

Pindel

Pindel is a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.  As an input Pindel requires  genomic reference in fasta format  and read file which stores one-end-mapped pair-end reads. As a result user gets mapped indels and an alignment of supporting reads with reference sequence.

website: pindel
Download:
Reference:Ye et al. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 2009. doi:10.1093/bioinformatics/btp394

 

Method in article Lee et al

Method developed by Lee et al uses probabilistic framwork for the identification of structural variants using clone-end sequencing.

website:
Download: source
Reference:Lee et al. A robust framework for detecting structural variations in a genome. Bioinformatics, 2008, 24, 13, i59-67.  doi:10.1093/bioinformatics/btn176

 

CNV-seq

CNV-seq is a method for detecting DNA copy number variation (CNV) usinh high-throughput sequencing.  As an input program requires an output of reads aligner (for exampe BLAT).  As an output user gets CNV predictions.

website: CNV-Seq
Download: here
Reference: Xie et al. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics, 2009, 10, 80. doi:10.1186/1471-2105-10-80

 

Back to the top

 


Useful links

  • High throughput sequencing tools - BioAssist wiki page
  • Useful comparison of alignment tools
  • NGS Aligment programs
  •  

 


 

Useful databases

Database containing clinical findings associated with submicroscopic chromosomal imbalance (including deletions, duplications, insertions, translocations, and inversions)  

  • DECIPHER (DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources)
  • dbVar
  • CHOP (The Copy Number Variation project at the Children's Hospital of Philadelphia)
  • DGV (Database of Genomic Variants)
  • ECARUCA

Back to the top


 

Data formats and standards

  • Samtools - SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments

 


 

 

Review articles

Article reviewing published articles found with the terms "copy number variation" and "structural variation" between Jan 1, 2004 and Nov 3, 2008.

  • Wain et al. Genomic copy number variation, human health, and disease. Lancet, 2009, 374, 9686, 340-350. doi:10.1016/S0140-6736(09)60249-X

Back to the top


 

Other articles

  • Tabone. Mutations, structural variations, and genome-wide resequencing: where to from here in our understanding of disease and evolution? Hum.Mutat., 2008, 29, 6, 886-890. doi:10.1002/humu.20781
  • Harismendy et.al. Evaluation of next generation sequencing platforms for population targeted sequencing studies,Genome Biology 2009, 10:R32
  • Bioinformatics issue on Next Generation Sequencing, 2009

 



‹ Tools predicting problems in mRNA splicing up Meta tools ›
  • Printer-friendly version
  • Login to post comments

Prediction tools

  • Tools predicting protein level changes
  • Tools considering epistatic effects
  • Tools predicting problems in mRNA splicing
  • Tools and methods for mapping genomic structural variation
  • Meta tools
  • Tools predicting the overal functional consequences of SNPs
  • SNP identification and annotation tools
  • Group home
  • Prediction Tools Wiki

Functional Prediction

  • You must login in order to post into this group.
  • Functional Prediction
    • Group home
    • Prediction Tools Wiki
G2P Knowledge Centre is part of GEN2PHEN and funded by the Health Thematic Area of the Cooperation Programme of the European Commission within the VII Framework Programme for Research and Technological Development.

© GEN2PHEN 2011
Follow @gen2phen
  • Contact Us