1 / 97

Introduction to bioinformatics

Introduction to bioinformatics. Barbera van Schaik b.d.vanschaik@amc.uva.nl. Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/. What is bioinformatics?. A set of software tools for molecular sequence analysis

aimee
Download Presentation

Introduction to bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to bioinformatics Barbera van Schaik b.d.vanschaik@amc.uva.nl Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/

  2. What is bioinformatics? • A set of software tools for molecular sequence analysis • The use of computers to collect, analyze, and interpret biological information at the molecular level. • The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information

  3. Bioinformatics Biomedical research mathematics mathematics Genomics Database technology database technology biology informatics biology informatics Proteomics statistics statistics Metabolomics Data management

  4. History

  5. The internet

  6. Molecular biology 1933 1953 1961 1980

  7. What is genomics? The application of high-throughput automated technologies to molecular biology. OR The experimental study of complete genomes.

  8. DNA microarrays

  9. AutomatedDNA sequencing

  10. 454, one run: 7.5 hours 400,000 sequences 200-300 bases per sequence = 100,000,000 bases per run Later in 2008: 400 bases per sequence Roche, 454 Illumina, Solexa Applied biosystems, SOLiD High throughput sequencing

  11. Applications highthroughput sequencing

  12. Sample storage

  13. Confused by genomics? Genomics Transcriptomics Proteomics Metabolomics Nutrigenomics Pharmacogenomics Epigenomics Infectomics Patientomics other 'omics'

  14. image credit: Digital Vision, PhotoDisc, Matt Ray/EHP

  15. Institutes that provide support • National Center for Biotechnology Information (NCBI, USA) http://www.ncbi.nlm.nih.gov/ • European Bioinformatics Institute (EBI, UK) http://www.ebi.ac.uk/ • Weizmann Institute of Science (Israel) http://bioportal.weizmann.ac.il/ • Swiss Institute of Bioinformatics (SIB) http://www.expasy.org/ • University of California Santa Cruz (UCSC) http://genome.ucsc.edu/

  16. Bioinformaticsin the Netherlands Universiteiten: -> * Universiteit Leiden (1575) -> * Rijksuniversiteit Groningen (1614) -> * Universiteit Utrecht (1636) -> * Universiteit van Amsterdam (1632) -> * Technische Universiteit Delft (1842) -> * Vrije Universiteit Amsterdam (1880) * Theologische Universiteit Apeldoorn (1894) -> * Erasmus Universiteit Rotterdam (1913) -> * Wageningen Universiteit (1918) -> * Radboud Universiteit Nijmegen (1923) * Universiteit van Tilburg (1927) * Nyenrode Business Universiteit (1946) * Theologische Universiteit Kampen (Oudestraat) (1854) * Theologische Universiteit Kampen (Broederweg) (1854) * Universiteit voor Humanistiek (1946) -> * Technische Universiteit Eindhoven (1956) -> * Universiteit Twente (1961) * Katholieke Theologische Universiteit (1967) -> * Universiteit Maastricht (1976) * Open Universiteit Nederland (1984)

  17. http://www.nbic.nl/ Bioinformaticsin the Netherlands

  18. Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression Scope guidelines Bioinformatics journal

  19. Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression

  20. Sequence analysis Function prediction (similarity, sequence search) Localisation (genefinding) Grouping (genes, protein families) Conservation (motifs, functional blocks) SNPs and mutations (variations)

  21. Multiple sequence alignment: in-exact matching of >2 sequences Sequence analysis Pairwise alignment: in-exact matching of 2 sequences

  22. Blast output

  23. Blast output - alignments

  24. Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression

  25. Phylogenetics • Evolution = mutation of DNA (and protein) sequences • Can we define evolutionary relationships between organisms by comparing DNA sequences • lots of methods and software, what is the "correct" analysis?

  26. Phylogenetics

  27. Phylogenetics Ciccarelli (2006), Science

  28. Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression

  29. Genome analysis Genome assembly http://www.wiley.com/legacy/college/boyer/0470003790/cutting_edge/shotgun_seq/shotgun.htm

  30. HGP Physical Mapping Minimal Tiling Set Shotgun Sequencing For each BAC in tiling: (~33 000 for human) Fragment Assembly Hierarchical shotgunsequencing Genome

  31. Gene annotationKey concepts Gene prediction: Usually the CDS is predicted, not a gene Gene annotation: Alternative splicing UTR Pseudogenes Known vs novelty genes etc.

  32. 3 Classes of'gene' prediction Ab-initio Genscan Grail FgenesH Genie GeneId Genefinder Glimmer etc Homology based GeneID Genomescan Twinscan etc Identity based Genewise Sim4 Spidey etc

  33. Ab-initio prediction CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA exon exon Example: Genscan

  34. Homology assistedprediction CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA EST exon exon Example: Genie, Grail

  35. Identity basedprediction homology known mRNA prediction Example: estToGenome, sim4

  36. human prediction Automated gene annotation homology Genscan IGI/IPI, OTTO, humans

  37. Genome analysis Comparative genomics Thomas et al (2003), Nature

  38. Gene structurein TranscriptView Provided by Jan Koster, Human Genetics, AMC

  39. Discovery of new variant Valentijn et al. (2005), Genomics

  40. Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression

  41. Genetics andpopulation analysis

  42. Genetics and population analysis http://www.hapmap.org/

  43. Copy number variation The Human Genome Structural Variation Working Group, Nature 2007

  44. Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression

  45. Gene expression analysis Statistical analysis of differential gene expression Expression-based classifiers Regulatory networks / Pathway analysis Integration of expression data Use genes, genesets

  46. Gene expression analysis Highthroughput techniques EST sequencing Microarrays Serial Analysis of Gene Expression (SAGE) Genome tiling arrays High throughput sequencing

  47. Microarray analysis Normalisation: correct for systematic bias Differential gene expression Clustering: grouping genes/samples Classification: signatures

  48. Normalisation DNA microarray data systematic effects resulting from biological process random measurement noise systematic effects resulting from array technology Results in false positives and false negatives Remove these effects by normalisation This is what we are interested in.

  49. Contributions to measured gene expression level ANOVA: analysis of variance yijkg = μ + Ai + Gg + (VG)kg + (AG)ig + (DG)jg + εijkg expression level Array/Gene effect Spot effect Dye effect Noise Gene expresion level (y) of 'Gene A' ANOVA: carefully consider experimental design

More Related