Tools considering epistatic effects

Contributed by:	Administrator
Originally posted:	11th September 2009: 9:14 am
Last updated:	16th September 2010: 10:01 am
Short URL:	https://gen2phen.org/node/5923

Functional Prediction

Public - anyone can view

Epistatic interactions of SNPs are believed to be very important in determining individual susceptibility to complex diseases. Multiple genetic variations may show little effect individually but strong interactions jointly which is known as epistasis or multilocus interaction (Cordell et al 2002). Therefore the detection of epistatic interactions may help to reveal the underlying mechanisms behind complex disease. This page describes methods which predict these interactions.

Contents

Programs available
Description of programs
Useful databases
Review articles
Other articles
Useful links

Programs available

epiMODE
epiForest
PBEAM
MECPM-SNP
MegaSNPHunter
RandomPat - available on request
SNPHarvester
SNPRuler
GPNN

Combinatorial methods - tested only on small datasets

Problem of these methods is the usage of exhaustive search methods which are not feasible in the datasets of the size of regular GWA studies.

MDR (exhaustive search method, nonparametric)
Monte Carlo logic regression
penalized regression
CSP
CPM
RPM
MARS
BGTA
FITF
HapForest
others ( Chatterjee et al)

Back to the top

Description of programs

epiMODE

Basis of epiMODE tool is a definition of "epistatic module" as a smallest genetic unit that independently influences the disease risk. Based on this definition epiMODE uses Bayesian marker partition model to explain observerd case-control data and uses Gibbs sampling strategy with reversible jump Markov chain Monte Carlo (RJ-MCMC) procedure to facilitate the detection of epistatic modules.

website: epiMODE

Download: for windows or for linux

Reference:Tang et al (2009). Epistatic module detection for case-control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet. 2009 May;5(5):e1000464. doi:10.1371/journal.pgen.1000464 or pubmedid=19412524

epiForest

epiForest is a random forest approach for the detection of epistatic interactions in case-control studies. First the random forest analysis with all SNPs is run to obtain the gini importance of the each SNP and then sliding window sequential forward feature selection (SWSFS) algorithm is used to select a subset of SNPs that can minimize the classification error of positive (cases) and negative (controls) samples when SNPs are used as categorical features. All possible interactions are enumerated for this subset obtained as a result of SWSFS algorithm. So as input program needs information about SNPs and information about cases and controls. As an output user gets interactions of candidate SNPs with statistical value describing the significance of their association.

Reference: Jiang et al (2009). A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 2009, 10 Suppl 1, S65.doi:10.1186/1471-2105-10-S1-S65.

PBEAM

PBEAM is a parallel version of BEAM (Bayesian Epistatis Association Mapping). BEAM uses Markov Chain Monte Carlo (MCMC) to search for both single-marker and interaction effects from case-control SNP data. BEAM algorithm has two essential components: a bayesian epistasis inference tool implemented via MCMC and a novel test statistic for evaluating statistical significance. Using these methods coming from opposite schools of statistics gives on the other hand the change to include prior knowledge of each marker (in coding region or not) and on the other hand using P values for evaluating statistical significance gives more robustness to the analysis. The BEAM algorithm takes case-control genotype marker data as input. The input genotyped markers should be in their natural genomic order when there's linkage disequilibrium (LD) among some of them (Zang et al). As an output of the analysis is a posterior probability whether each marker or epistasis (interactive set of markers) is associated with disease. It classifies the SNPs into three types: SNPs associated with the disease, SNPs contributing to the disease susceptibility independently and SNPs influencing the disease risk jointly with each other (Tang et al).

website:PBEAM

Download:http://bioinfo.au.tsinghua.edu.cn/pbeam/src-1.2.1.tar.gz

Reference:Zhang et al. Bayesian inference of epistatic interactions in case-control studies.Nat.Genet., 2007, 39, 9, 1167-1173.doi:10.1038/ng2110

Back to the top

MECPM-SNP

website: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm

Download:

Reference: Miller et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics, 2009, 25, 19, 2478-2485. doi:10.1093/bioinformatics/btp435

MegaSNPHunter

MegaSNPHunter takes case-control genotype data as input and produces a ranked list of multi-SNP interactions. The method works in the following way: Whole genome is partitioned into multiple short subgenomes which each cover the genomic area of possible haplotype effects. For each of these subgenomes MegaSNPHunter builds a boosting tree classifier based on multi-SNP interactions and it measures the importance of SNPs on the basis of their contributions in the classifier. The method keeps relatively more important SNPs and lets them compete with each other in the same way in the next level. The competition terminates when the number of selected SNPs is less than the size of the subgenome. Finally MegaSNPHunter extracts and reports the valuable multi-SNP interactions.

Reference: Wan et al(2009). MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics 2009,10:13. doi:10.1186/1471-2105-10-13

SNPHarvester

SNPHarverster is a method to detect SNP-SNP interactions in GWA studies. Its a stochastic search method and it can select a set of significant SNP groups from hundreds of thousands of SNPs efficiently. These selected SNP groups can then be searched by other methods. SNPHarvester is a useful tool because most of the tools looking for epistatic interactions cannot handle the amount of data obtained by GWA studies. Therefore they need a reduced set of this data. SNPHarvester efficiently reduces the number of SNPs and enables the direct application of existing statistical tools in interaction detection. SNPHarvester is an intermediate tool that takes in genotypes from GWA study and as output gives out SNP groups which should then be analyzed by programs like MDR.

website:

Download: SNPHarvester

Reference: Yang et al. SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics, 2009, 25, 4, 504-511. doi:10.1093/bioinformatics/btn652

SNPRuler

SNPRuler uses a novel learning approach based on the predictive rule learning to detect epistatic interactions. Rules learning is used for infering interactions because each epistatic interaction implicitly contains some predictive rules and because finding and evaluating rules are much easier and faster than finding and evaluating interactions. Learning algorithm used here seeks to identify the rules and uses them to infer possible epistatic interactions.

website:

Download: SNPRuler.zip

Reference: Wan et al.Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010 26(1):30-37; doi:10.1093/bioinformatics/btp622

GPNN (Genetic programming optimized neural network)

website:

Download:

Reference: Motsinger et al. GPNN: power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease.BMC Bioinformatics, 2006, 7, 39. doi:10.1186/1471-2105-7-39

Back to the top

MDR (Multifactor-Dimensionality Reduction)

MDR identifies k-way interactions through an exhaustive search and evaluates the association between each interaction and the disease by cross-validations (Zhang et al). This type of exhaustive search method works well on small size problem. In GWA studies, direct application of these methods is computationally prohibitive. An effective filtering is needed to significantly reduce the number of SNPs so that exhaustive search is computationally feasible on the reduced SNP set (Yang et al. 2009). SNPHarvester can be used for this purpose. According to comparison made by Zhang et al MDR performs better than logic regression for common diseases but has little power when disease allele frequencies were small.

website:MDR at epistasis website

Download:MDR at sourceforge.net

Reference:Ritchie et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am.J.Hum.Genet., 2001, 69, 1, 138-147. doi:10.1086/321276

Monte Carlo Logic Regression

Monte Carlo Logic Regression used in the article by Kooperber et al combines logic regression and MCMC in searching the SNP interactions.

website: logic regression

Download: from R CRAN package:[LogicReg]

Reference:Kooperberg C, Ruczinski I (2005). Identifying Interacting SNPs using Monte Carlo Logic Regression. Genetic Epidemiology, 28(2): 157-70.

Penalized Regression

Penalized regression uses variant of logistic regression with quadratic penalization to detect epistatic interactions.

Reference: Park et al. Penalized logistic regression for detecting gene interactions. Biostatistics, 2008, 9, 1, 30-50.doi:10.1093/biostatistics/kxm010

Back to the top

CPM (Combinatorial Partitioning Method)

CPM exhaustively searches for combinatory genotype group that had the most significant difference in the mean of the responding continuos phenotype (Tang et al). CPM uses brute-force search which is impractical for large datasets.

Reference: Nelson et al. (2001). A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res., 2001, 11, 3, 458-470. doi:10.1101/gr.172901

RPM (Restricted Partitioning Method)

RPM is a modifies the CPM method by ignoring the partitions tht combined individual genotypes with very different mean trait values (Tang et al).

Reference: Culverhouse et al. (2004). Detecting epistatic interactions contributing to quantitative traits. Genet.Epidemiol., 2004, 27, 2, 141-152. doi: 10.1002/gepi.20006

MARS (Multivariate Adaptive Regression SPLINE)

Back to the top

BGTA (Backward genotyp-trait association)

BGTA uses a bootstrap-type resampling screening procedure to select markers, and those markers which return frequencies greater than third quatile plus 1.8 times the interquartile range are considered to be disease-associated markers (Zhang et al).

Reference: Zeng et al (2006). Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs.Hum.Hered., 2006, 62, 4, 196-212. doi:10.1159/000096995

FITF (Focused Interaction Testing Framework)

website: FITF

Reference: Millstein et al. A testing framework for identifying susceptibility genes in the presence of epistasis. Am.J.Hum.Genet., 2006, 78, 1, 15-27. doi:10.1086/498850

HapForest

HapForest uses a forest-based approach to identifying haplotype-haplotype interactions.

Reference: Chen et al. A forest-based approach to identifying gene and gene gene interactions. Proc.Natl.Acad.Sci.U.S.A., 2007, 104, 49, 19199-19203.doi:10.1073/pnas.0709868104

Back to the top

BOOST

“BOolean Operation-based Screening and Testing” (BOOST) is a method for the discovery of unknown gene-gene interactions that underlie complex diseases. It belongs to the group of methods testing statistical epistasis. BOOST allows examination of all pairwise interactions in genome-wide case-control studies.

website: BOOST

Download: BOOST executables or from sourceforge.net

Reference:Xiang Wan et al (2010).BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies.The American Journal of Human Genetics, Volume 87, Issue 3, 325-340, 02 September 2010. doi:10.1016/j.ajhg.2010.07.021

INTERSNP

INTERSNP is a software for genome-wide interaction analysis (GWIA) of case-control SNP data and quantitative traits. SNPs are selected for joint analysis using a priori information. Sources of information to define meaningful strategies can be statistical evidence (single marker association at a moderate level, computed from the own data) and genetic/biologic relevance (genomic location, function class or pathway information).

website: INTERSNP

Download: here

Reference: Herold et al. INTERSNP: genome-wide interaction analysis guided by a priori information, Bioinformatics 25 (2009), pp. 3275–3281. doi: 10.1093/bioinformatics/btp596

PLINK

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. In addition to its other functions it can be used to test for statistical epistasis.

website: PLINK

Download: here

Reference: Purcell et al. PLINK: a tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet. 81 (2007), pp. 559–575. doi:10.1086/519795

Back to the top

Useful databases

Back to the top

Review articles

Liang,y. and Kelelm, A. (2008). Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases. Statist. Surv. Volume 2 (2008), 43-60. doi:10.1214/07-SS026
Musani et al (2007). Detection of gene x gene interactions in genome-wide association studies of human population data. Hum.Hered., 2007, 63, 2, 67-84. doi:10.1159/000099179
Motsinger et al (2007). Novel methods for detecting epistasis in pharmacogenomics studies. Pharmacogenomics, 2007, 8, 9, 1229-1241.doi:10.2217/14622416.8.9.1229

Back to the top

Useful Links

http://www.epistasis.org/
Q&A of Epistasis

Back to the top

Printer-friendly version
Login to post comments

Tools considering epistatic effects

Programs available

Combinatorial methods - tested only on small datasets

Methods testing statistical epistasis

Recommended review article

Back to the top

Description of programs

epiMODE

website: epiMODE

Download: for windows or for linux

Reference:Tang et al (2009). Epistatic module detection for case-control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet. 2009 May;5(5):e1000464. doi:10.1371/journal.pgen.1000464 or pubmedid=19412524

epiForest

Reference: Jiang et al (2009). A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 2009, 10 Suppl 1, S65.doi:10.1186/1471-2105-10-S1-S65.

PBEAM

website:PBEAM

Download:http://bioinfo.au.tsinghua.edu.cn/pbeam/src-1.2.1.tar.gz

Reference:Zhang et al. Bayesian inference of epistatic interactions in case-control studies.Nat.Genet., 2007, 39, 9, 1167-1173.doi:10.1038/ng2110

MECPM-SNP

website: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm

Download:

Reference: Miller et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics, 2009, 25, 19, 2478-2485. doi:10.1093/bioinformatics/btp435

MegaSNPHunter

Reference: Wan et al(2009). MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics 2009,10:13. doi:10.1186/1471-2105-10-13

SNPHarvester

website:

Download: SNPHarvester

Reference: Yang et al. SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics, 2009, 25, 4, 504-511. doi:10.1093/bioinformatics/btn652

SNPRuler

website:

Download: SNPRuler.zip

Reference: Wan et al.Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010 26(1):30-37; doi:10.1093/bioinformatics/btp622

GPNN (Genetic programming optimized neural network)

website:

Download:

Reference: Motsinger et al. GPNN: power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease.BMC Bioinformatics, 2006, 7, 39. doi:10.1186/1471-2105-7-39

MDR (Multifactor-Dimensionality Reduction)

website:MDR at epistasis website

Download:MDR at sourceforge.net

Reference:Ritchie et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am.J.Hum.Genet., 2001, 69, 1, 138-147. doi:10.1086/321276

Monte Carlo Logic Regression

website: logic regression

Download: from R CRAN package:[LogicReg]

Reference:Kooperberg C, Ruczinski I (2005). Identifying Interacting SNPs using Monte Carlo Logic Regression. Genetic Epidemiology, 28(2): 157-70.

Penalized Regression

Reference: Park et al. Penalized logistic regression for detecting gene interactions. Biostatistics, 2008, 9, 1, 30-50.doi:10.1093/biostatistics/kxm010

CPM (Combinatorial Partitioning Method)

Reference: Nelson et al. (2001). A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res., 2001, 11, 3, 458-470. doi:10.1101/gr.172901

RPM (Restricted Partitioning Method)

Reference: Culverhouse et al. (2004). Detecting epistatic interactions contributing to quantitative traits. Genet.Epidemiol., 2004, 27, 2, 141-152. doi: 10.1002/gepi.20006

MARS (Multivariate Adaptive Regression SPLINE)

BGTA (Backward genotyp-trait association)

Reference: Zeng et al (2006). Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs.Hum.Hered., 2006, 62, 4, 196-212. doi:10.1159/000096995

FITF (Focused Interaction Testing Framework)

website: FITF

Reference: Millstein et al. A testing framework for identifying susceptibility genes in the presence of epistasis. Am.J.Hum.Genet., 2006, 78, 1, 15-27. doi:10.1086/498850

HapForest

Reference: Chen et al. A forest-based approach to identifying gene and gene gene interactions. Proc.Natl.Acad.Sci.U.S.A., 2007, 104, 49, 19199-19203.doi:10.1073/pnas.0709868104

BOOST

website: BOOST

Download: BOOST executables or from sourceforge.net

Reference:﻿Xiang Wan et al (2010).BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies.The American Journal of Human Genetics, Volume 87, Issue 3, 325-340, 02 September 2010. doi:10.1016/j.ajhg.2010.07.021

INTERSNP

website: INTERSNP

Download: here

Reference:﻿﻿ ﻿Herold et al. INTERSNP: genome-wide interaction analysis guided by a priori information, Bioinformatics 25 (2009), pp. 3275–3281. doi: 10.1093/bioinformatics/btp596

PLINK

website: PLINK

Download: here

Reference:﻿﻿ ﻿Purcell et al. PLINK: a tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet. 81 (2007), pp. 559–575. doi:10.1086/519795

Back to the top

Useful databases

Review articles

Other articles

Useful Links

Prediction tools

Functional Prediction

Reference:Xiang Wan et al (2010).BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies.The American Journal of Human Genetics, Volume 87, Issue 3, 325-340, 02 September 2010. doi:10.1016/j.ajhg.2010.07.021

Reference: Herold et al. INTERSNP: genome-wide interaction analysis guided by a priori information, Bioinformatics 25 (2009), pp. 3275–3281. doi: 10.1093/bioinformatics/btp596

Reference: Purcell et al. PLINK: a tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet. 81 (2007), pp. 559–575. doi:10.1086/519795