Skip to main content
G2P Knowledge Centre logo
Login or use OpenID
Need an account? Contact us
GEN2PHEN logo
  • Home
  • News
  • Events
  • Community
  • Data
  • About GEN2PHEN
Home » Groups » Functional Prediction

Tools considering epistatic effects

  • View
  • Revisions
Contributed by: Administrator
Originally posted: 11th September 2009: 9:14 am
Last updated: 16th September 2010: 10:01 am
Short URL: https://gen2phen.org/node/5923
Interest group iconFunctional Prediction
Public documentPublic - anyone can view
Tweet

 

Epistatic interactions of SNPs are believed to be very important in determining individual susceptibility to complex diseases.   Multiple genetic variations may show little effect individually but strong interactions jointly which is known as epistasis or multilocus interaction (Cordell et al 2002).  Therefore the detection of epistatic interactions may help to reveal the underlying mechanisms behind complex disease. This page describes methods which predict these interactions.

 

 

Contents
  • Programs available
  • Description of programs
  • Useful databases
  • Review articles
  • Other articles
  • Useful links

 

Programs available

  • epiMODE
  • epiForest
  • PBEAM
  • MECPM-SNP
  • MegaSNPHunter
  • RandomPat - available on request
  • SNPHarvester
  • SNPRuler
  • GPNN

 

Combinatorial methods - tested only on small datasets

Problem of these methods is the usage of exhaustive search methods which are not feasible in the datasets of the size of regular GWA studies.

  • MDR   (exhaustive search method, nonparametric)
  • Monte Carlo logic regression
  • penalized regression
  • CSP
  • CPM
  • RPM
  • MARS
  • BGTA
  • FITF
  • HapForest
  • others ( Chatterjee et al)

 

Methods testing statistical epistasis

  • BOOST
  • INTERSNP
  • PLINK

 

Recommended review article

To see a extensive list of methods published on year 2007 and before that see review article Motsinger et al.

Back to the top

 


 

Description of programs


epiMODE

Basis of epiMODE tool is a definition of "epistatic module" as a smallest genetic unit that independently influences the disease risk. Based on this definition epiMODE uses Bayesian marker partition model to explain observerd case-control data and uses Gibbs sampling strategy with  reversible jump Markov chain Monte Carlo (RJ-MCMC) procedure to facilitate the detection of epistatic modules.

website: epiMODE
Download: for windows or for linux
Reference:Tang et al (2009). Epistatic module detection for case-control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet. 2009 May;5(5):e1000464. doi:10.1371/journal.pgen.1000464 or pubmedid=19412524

 

epiForest

epiForest is a random forest approach for the detection of epistatic interactions in case-control studies.  First the random forest analysis with all SNPs is run to obtain the gini importance  of the each SNP and then sliding window sequential forward feature selection (SWSFS) algorithm is used to select a subset of SNPs that can minimize the classification error of positive (cases) and negative (controls) samples when SNPs are used as categorical features.  All possible interactions are enumerated for this subset obtained as a result of  SWSFS  algorithm. So as input program needs information about SNPs and information about cases and controls. As an output user gets interactions of candidate SNPs  with statistical value describing the significance of their association.

Reference: Jiang et al (2009). A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 2009, 10 Suppl 1, S65.doi:10.1186/1471-2105-10-S1-S65.


PBEAM

PBEAM is a parallel version of BEAM (Bayesian Epistatis Association Mapping). BEAM uses Markov Chain Monte Carlo (MCMC) to search for both single-marker and interaction effects from case-control SNP data. BEAM algorithm has two essential components: a bayesian epistasis inference tool implemented via MCMC and a novel test statistic for evaluating statistical significance. Using these methods coming from opposite schools of statistics gives on the other hand the change to include prior knowledge of each marker (in coding region or not) and on the other hand using P values for evaluating statistical significance gives more robustness to the analysis. The BEAM algorithm takes case-control genotype marker data as input. The input genotyped markers should be in their natural genomic order when there's linkage disequilibrium (LD) among some of them (Zang et al). As an output of the analysis is a posterior probability whether each marker or epistasis (interactive set of markers) is associated with disease. It classifies the SNPs into three types: SNPs associated with the disease, SNPs contributing to the disease susceptibility independently and SNPs influencing the disease risk jointly with each other (Tang et al).

website:PBEAM
Download:http://bioinfo.au.tsinghua.edu.cn/pbeam/src-1.2.1.tar.gz
Reference:Zhang et al. Bayesian inference of epistatic interactions in case-control studies.Nat.Genet., 2007, 39, 9, 1167-1173.doi:10.1038/ng2110

 

Back to the top

 

MECPM-SNP

website: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm
Download:
Reference: Miller et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics, 2009, 25, 19, 2478-2485. doi:10.1093/bioinformatics/btp435

 

MegaSNPHunter

MegaSNPHunter takes case-control genotype data as input and produces a ranked list of multi-SNP interactions. The method works in the following way: Whole genome is partitioned into multiple short subgenomes which each cover the genomic area of possible haplotype effects. For each of these subgenomes MegaSNPHunter builds a boosting tree classifier based on multi-SNP interactions and it measures  the importance of SNPs on the basis of their contributions in the classifier.  The method keeps relatively more important SNPs and lets them compete with each other in the same way in the next level. The competition terminates when the number of selected SNPs is less than the size of the subgenome. Finally MegaSNPHunter extracts and reports the valuable multi-SNP interactions.

Reference: Wan et al(2009). MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics 2009,10:13. doi:10.1186/1471-2105-10-13

 

SNPHarvester

SNPHarverster  is a method to detect SNP-SNP interactions in GWA studies. Its a stochastic search method and it can select a set of significant SNP groups from hundreds of thousands of SNPs efficiently. These selected SNP groups can then be searched by other methods. SNPHarvester is a useful tool because most of the tools looking for epistatic interactions cannot handle the amount of data obtained by GWA studies. Therefore they need a reduced set of this data. SNPHarvester  efficiently reduces the number of SNPs and enables the direct application of existing statistical tools in interaction detection. SNPHarvester is an intermediate tool that takes in genotypes from GWA study and as output gives out SNP groups which should then be analyzed by programs like  MDR.

website:
Download: SNPHarvester
Reference: Yang et al. SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics, 2009, 25, 4, 504-511. doi:10.1093/bioinformatics/btn652

 

SNPRuler

SNPRuler uses a novel learning approach based on the predictive rule learning to detect epistatic interactions.   Rules learning is used for infering interactions because each epistatic interaction implicitly contains some predictive rules and because finding and evaluating rules are much easier and faster than finding and evaluating interactions. Learning algorithm used here seeks to identify the rules and uses them to infer possible epistatic interactions.

website:
Download: SNPRuler.zip
Reference: Wan et al.Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010 26(1):30-37; doi:10.1093/bioinformatics/btp622

 

GPNN (Genetic programming optimized neural network)

 

website:
Download:
Reference: Motsinger et al. GPNN: power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease.BMC Bioinformatics, 2006, 7, 39. doi:10.1186/1471-2105-7-39

 

Back to the top

 

 


 

 

MDR (Multifactor-Dimensionality Reduction)

MDR identifies k-way interactions through an exhaustive search and evaluates the association between each interaction and the disease by cross-validations (Zhang et al).  This type of  exhaustive search method works well on small size problem. In GWA studies, direct application of these methods is computationally prohibitive. An effective filtering is needed to significantly reduce the number of SNPs so that exhaustive search is computationally feasible on the reduced SNP set (Yang et al. 2009). SNPHarvester can be used for this purpose. According to comparison made by Zhang et al MDR performs better than logic regression for common diseases but has little power when disease allele frequencies were small.

website:MDR at epistasis website
Download:MDR at sourceforge.net
Reference:Ritchie et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am.J.Hum.Genet., 2001, 69, 1, 138-147. doi:10.1086/321276

 

Monte Carlo Logic Regression

Monte Carlo Logic Regression used in the article by Kooperber et al combines logic regression and MCMC in searching the SNP interactions.

website: logic regression
Download: from R CRAN package:[LogicReg]
Reference:Kooperberg C, Ruczinski I (2005). Identifying Interacting SNPs using Monte Carlo Logic Regression. Genetic Epidemiology, 28(2): 157-70.

 

Penalized Regression

Penalized regression uses variant of logistic regression with quadratic penalization to detect epistatic interactions.

Reference: Park et al. Penalized logistic regression for detecting gene interactions. Biostatistics, 2008, 9, 1, 30-50.doi:10.1093/biostatistics/kxm010

 

Back to the top

 

CPM (Combinatorial Partitioning Method)

CPM exhaustively searches for combinatory genotype group that had the most significant difference in the mean of the  responding continuos phenotype (Tang et al). CPM uses brute-force search which is impractical for large datasets.

Reference: Nelson et al. (2001). A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res., 2001, 11, 3, 458-470. doi:10.1101/gr.172901


RPM  (Restricted Partitioning Method)

RPM is a modifies the CPM method by ignoring the partitions tht combined individual genotypes with very different mean trait values (Tang et al).

Reference: Culverhouse et al. (2004). Detecting epistatic interactions contributing to quantitative traits. Genet.Epidemiol., 2004, 27, 2, 141-152. doi: 10.1002/gepi.20006

MARS (Multivariate Adaptive Regression SPLINE)

 

Back to the top

 

BGTA (Backward genotyp-trait association)

BGTA uses a bootstrap-type resampling screening procedure to select markers, and those markers which return frequencies greater than third quatile plus 1.8 times the interquartile range are considered to be  disease-associated markers (Zhang et al).

Reference: Zeng et al (2006). Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs.Hum.Hered., 2006, 62, 4, 196-212. doi:10.1159/000096995

 

FITF (Focused Interaction Testing Framework)

website: FITF
Reference: Millstein et al. A testing framework for identifying susceptibility genes in the presence of epistasis. Am.J.Hum.Genet., 2006, 78, 1, 15-27. doi:10.1086/498850

 

HapForest

HapForest uses a forest-based approach to identifying haplotype-haplotype interactions.

Reference: Chen et al. A forest-based approach to identifying gene and gene gene interactions. Proc.Natl.Acad.Sci.U.S.A., 2007, 104, 49, 19199-19203.doi:10.1073/pnas.0709868104

Back to the top

 


BOOST

“BOolean Operation-based Screening and Testing” (BOOST) is a method for the discovery of  unknown gene-gene interactions that underlie complex diseases. It belongs to the group of methods testing statistical epistasis. BOOST allows examination of all pairwise interactions in genome-wide case-control studies. 

website: BOOST
Download: BOOST executables or from sourceforge.net 
Reference:Xiang Wan et al (2010).BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies.The American Journal of Human Genetics, Volume 87, Issue 3, 325-340, 02 September 2010. doi:10.1016/j.ajhg.2010.07.021

 

 

INTERSNP

INTERSNP is a software for genome-wide interaction analysis (GWIA) of case-control SNP data and quantitative traits. SNPs are selected for joint analysis using a priori information. Sources of information to define meaningful strategies can be statistical evidence (single marker association at a moderate level, computed from the own data) and genetic/biologic relevance (genomic location, function class or pathway information).

website: INTERSNP 
Download:  here
Reference: Herold et al.  INTERSNP: genome-wide interaction analysis guided by a priori information, Bioinformatics 25 (2009), pp. 3275–3281. doi: 10.1093/bioinformatics/btp596

 

PLINK

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. In addition to its other functions it can be used to test for statistical epistasis. 

website: PLINK
Download:  here
Reference: Purcell et al. PLINK: a tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet. 81 (2007), pp. 559–575. doi:10.1086/519795

 

Back to the top


 

Useful databases

Back to the top


 

Review articles

  • Liang,y. and Kelelm, A. (2008). Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases. Statist. Surv. Volume 2 (2008), 43-60. doi:10.1214/07-SS026
  • Musani et al (2007). Detection of gene x gene interactions in genome-wide association studies of human population data. Hum.Hered., 2007, 63, 2, 67-84. doi:10.1159/000099179
  • Motsinger et al (2007).  Novel methods for detecting epistasis in pharmacogenomics studies. Pharmacogenomics, 2007, 8, 9, 1229-1241.doi:10.2217/14622416.8.9.1229

Back to the top


Other articles

  • Cordell: Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum.Mol.Genet., 2002, 11, 20, 2463-2468. http://hmg.oxfordjournals.org/cgi/content/abstract/11/20/2463

Back to the top


 

Useful Links

  • http://www.epistasis.org/
  • Q&A of Epistasis

Back to the top



‹ Protein level predictions 8 - Other tools predicting protein functional changes up Tools predicting problems in mRNA splicing ›
  • Printer-friendly version
  • Login to post comments

Prediction tools

  • Tools predicting protein level changes
  • Tools considering epistatic effects
  • Tools predicting problems in mRNA splicing
  • Tools and methods for mapping genomic structural variation
  • Meta tools
  • Tools predicting the overal functional consequences of SNPs
  • SNP identification and annotation tools
  • Group home
  • Prediction Tools Wiki

Functional Prediction

  • You must login in order to post into this group.
  • Functional Prediction
    • Group home
    • Prediction Tools Wiki
G2P Knowledge Centre is part of GEN2PHEN and funded by the Health Thematic Area of the Cooperation Programme of the European Commission within the VII Framework Programme for Research and Technological Development.

© GEN2PHEN 2011
Follow @gen2phen
  • Contact Us