1 / 27

Introduction to Bioinformatics - Tutorial no. 12

Introduction to Bioinformatics - Tutorial no. 12. Expression Data Analysis: - Clustering - GEO - EPClust. Application of Microarrays. We only know the function of about 20% of the 30,000 genes in the Human Genome Gene exploration Faster and better Applications: Evolution Behavior

lee
Download Presentation

Introduction to Bioinformatics - Tutorial no. 12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics - Tutorial no. 12 Expression Data Analysis: - Clustering - GEO - EPClust

  2. Application of Microarrays • We only know the function of about 20% of the 30,000 genes in the Human Genome • Gene exploration • Faster and better • Applications: • Evolution • Behavior • Cancer Research

  3. Microarray Analysis • Unsupervised Grouping: Clustering • Pattern discovery via grouping similarly expressed genes together • Three techniques most often used • k-Means Clustering • Hierarchical Clustering • Kohonen Self Organizing Feature Maps

  4. Hierarchical Agglomerative Clustering Michael Eisen, 1998 • Cluster (algorithm) • TreeView (visualization) • Hierarchical Agglomerative Clustering • Step 1: Similarity score between all pairs of genes • Pearson Correlation • Euclidean distance • Step 2: Find the two most similar genes, replace with a node that contains the average • Builds a tree of genes • Step 3: Repeat

  5. 2 4 5 3 1 1 3 2 4 5 Need to define the distance between thenew cluster and the other clusters. Single Linkage: distance between closest pair. Complete Linkage: distance between farthest pair. Average Linkage: average distance between all pairs or distance between cluster centers Agglomerative Hierarchical Clustering Distance between joined clusters The dendrogram induces a linear ordering of the data points Dendrogram

  6. Results of Clustering Gene Expression • CLUSTER is simple and easy to use • De facto standard for microarray analysis • Limitations: • Hierarchical clustering in general is not robust • Genes may belong to more than one cluster

  7. K-Means Clustering Algorithm • Randomly initialize k cluster means • Iterate: • Assign each genes to the nearest cluster mean • Recompute cluster means • Stop when clustering converges Notes: • Really fast • Genes are partitioned into clusters • How do we select k?

  8. K-Means Algorithm • Randomly Initialize Clusters

  9. K-Means Algorithm • Assign data points to nearest clusters

  10. K-Means Algorithm • Recalculate Clusters

  11. K-Means Algorithm • Recalculate Clusters

  12. K-Means Algorithm • Repeat

  13. K-Means Algorithm • Repeat

  14. K-Means Algorithm • Repeat … until convergence

  15. EPClust Input (1) Expression data matrix Extra annotation for gene rows Method of tabulation Name for further analysis

  16. EPClust Input (2) Method of measuring distance between gene rows Cluster hierarchically Number k of means Cluster into k means

  17. GEO: Gene Expression Omnibus • NCBI database for gene expression data • Founded at end of 2000

  18. Querying GEO Browse records Search for entries containing a gene Search for experiments Search with Entrez

  19. SGD – Expression database http://db.yeastgenome.org/cgi-bin/expression/expressionConnection.pl

  20. SGD – Expression database

  21. SGD – Expression database

  22. SGD – Expression database

  23. Two labs are running experiments on the APO1 gene. Suggest a method that would allow them to compare their results. • Gene grouping • Relative values

  24. Explain how microarrays can be used as a basis for diagnostic

  25. Explain how microarrays can be used as a basis for diagnostic

More Related