1 / 13

Modeling Gene Interactions in Disease

Modeling Gene Interactions in Disease. CS 686 Bioinformatics. Some Definitions. Data mining : extracting hidden patterns and useful info from large data sets. Ex- clustering, machine learning. Should not be:

Download Presentation

Modeling Gene Interactions in Disease

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Gene Interactions in Disease CS 686 Bioinformatics

  2. Some Definitions • Data mining: extracting hidden patterns and useful info from large data sets. Ex- clustering, machine learning. Should not be: "Torturing data until it confesses ... and if you torture it enough, it will confess to anything"  - Jeff Jonas, IBM • Machine learning: the ability of a program to learn from experience. Ex- neural networks, decision trees, rule-based methods, MDR.

  3. Methods • Regression methods: modeling the relationship between a dependent variable and one of more independent variables. • Data mining methods: Search the space of possible models efficiently. Better with non-linear and high-dimensional data, or data with many potential interactions. • Exhaustive Search: search all possible models for the best one.

  4. Linear regression • Relates outcome as a linear combination of the parameters (but not necessarily of the independent variables). • Ex: Let y = incidence of disease, n data points. Independent variables A,B 1) yi = b0 + b1Ai + εi, i = 1,…,n 2) yi = b0 + b2(Bi)2 + εi, i = 1,…,n where b0, b1,b2 = parameters, εi is error term. In both of these examples, the disease is modeled as linear in the parameters, although it is quadratic in variable B

  5. Linear regression Given a sample, we estimate the params (ex: can use least squares) to arrive at the linear regression model: [1]

  6. Multiple regression • Relates the the probability of an event to a linear combination of predictor variables. • Ex: Let y = incidence of disease, n data points. Independent variables x1, x2 yi = b0 + b1xi1 + b2xi2 + … + bpxip + εi, i = 1,…,n Best-fit line: For each unit increase in xip, is expected to increase by .

  7. Logistic regression[1] • Often used when the outcome is binary, relates the log-odds of the probability of an event to a linear combination of predictor variables. Ex: • ln(p/(1 – p)) = α + βxB + γxC + ixBxC, where xBand xC are measured binary indicator variables, and regression coefficients βand y represent main effects, i represents interaction.

  8. Other statistical methods [1] • Bayesian model selection: a statistical approach incorporating both prior distributions for parameters and observed data into the model. • Maximum likelihood: a statistical method used to make inferences about the combination of parameter values resulting in the highest probability of obtaining the observed data

  9. Modeling Terminology[1] • Saturated: a statistical model that is as full as possible (saturated) with parameters. • Marginal effects: the effects of one parameter averaged over the possible values taken by other parameters • Entropy: the uncertainty associated with a random variable

  10. Modeling Terminology[1] • Cross-validation: partitioning a data set into n subsets, then using each subset in turn as the test set while using the other n-1 to train. • Overfitting: a model that provides a good fit to a specific data set but generalizes poorly. • Marginal effects: the effects of one parameter averaged over the possible values taken by other parameters.

  11. Marginal Effects [2] Marginal penetrance: Ex: The probability P(D|A=Aa), irrespective of what value B has Table II. Penetrance values for combinations of genotypes from two single  nucleotide polymorphisms exhibiting interactions in the absence of independent main effects Genotype Genotype Marginal penetrance B AA (0.25)      Aa (0.50) aa (0.25) BB (0.25)         0          1          0 0.5 Bb (0.50)        1          0          1 0.5 bb (0.25)         0          1          0 0.5 Marginal          0.5          0.5          0.5 penetrance A Genotype frequencies are given in parentheses Marginal penetrance values for the A, B genotypes.

  12. Weka [3] • A collection of visualization tools and algorithms for data analysis and predictive modeling. • Preprocessing tools for reading data in a variety of formats and transforming it. • Classification algorithms include regression, neural network, support vector machine, decision tree. Display includes ROC curves • Clustering: k-means, expectation maximization • Visualization includes scatter-plot, bar graph

  13. References • Cordell, 2009, Detecting gene–gene interactions that underlie human diseases. Nature Review Genetics • McKinney et al, 2006, Machine Learning for Detecting Gene-Gene Interactions, A Review. Biomedical Genomics and Proteomics • Weka site: http://www.cs.waikato.ac.nz/ml/weka

More Related