1 / 53

Outline for today

Outline for today. Lec 02. Theoretical part. What is a gene a n d genome? C omponents in complex genomes. The genomic paradoxes. S equencing DNA Basics of the “old” technology. N ext-generation sequencing In vivo and in vitro DNA replication (PCR). Practical part

Download Presentation

Outline for today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline for today Lec 02 Theoretical part • What is a geneand genome? • Components in complex genomes. • The genomic paradoxes. • Sequencing DNA • Basics of the “old” technology. • Next-generation sequencing • In vivo and in vitro DNA replication (PCR) • Practical part • Primer design programs Slide 27

  2. What is the genome? Lec 02 • The genome is all the DNA in a cell. • All the DNA on all the chromosomes. • Includes genes, intergenic sequences, repeats. • Eukaryotes can have 2-3 genomes. • Nuclear genome • Mitochondrial genome • Plastid genome • If not specified, “genome” usually refers to the nuclear genome. • In eukaryotes, this term is commonly used to refer to one complete haploid set of chromosomes, such as that found in a sperm or egg. • The units of length of nucleic acids in which genome sizes are expressed: • Kilobase (Kb) 103 base pairs • Megabase (Mb) 106 base pairs Slide 28

  3. The human genome Lec 02 1-2 trillion cells. A human cell contains 2-3 meters of DNA. 2  23 = 46 chromosomes. On the average, a single human chromosome consists of 5 cm of DNA. The human haploid genome is 3.4 109bp. The total length of DNA in one adult human is 2.0 × 1013 meters. Slide 29

  4. Surprisingly the human genome... Lec 02 The total length of DNA in an adult human is 2.0 x 1013 meters (the equivalent of 70 trips from Earth to the Sun and back, or 5 trips from the Sun to Neptune and back). • However in comparison the human genom is:Small, Empty, Unoriginal, Repetitive 4 Slide 30

  5. Distinct components in complex genomes Lec 02 Slide 31

  6. What is genes? Lec 02 Genesare the basic unit of heredity • A gene is a unit of heredity in a living organism. • Gene is a segment of DNA that is involved in producing a polypeptide chain; it can include regions preceding and following the coding DNA as well as introns between the exons. • Complex genomes have almost 10x to 30x more DNA than is required to encode all the RNAs or proteins in the organism. • Contributors to the non-coding DNA include: • Introns in genes • Regulatory elements of genes • Multiple copies of genes, including pseudogenes • Intergenic sequences • Interspersed repeats An intergenic region (IGR) is a stretch of DNA sequences located between clusters of genes that contain few or no genes. Slide 32

  7. Genome Size Lec 02 • Viral genomes are typically in the range 100–1000 kb. • Bacteriophage MS2, one of the smallest viruses, has only four genes in a single stranded RNA molecule of about 4000 nucleotides (4kb). • Bacterial genomes are larger, typically in the range 1–10 Mb. • The chromosome of Escherichia coli is a circular DNA molecule of 4600 kb. • Eukaryotic genomes are typically in the range 100–1000 Mb. • Among eukaryotes, genome size often differs tremendously, even among closely related species. Slide 33

  8. 3.4  109 bp Homo sapiens 6.8  1011 bp Amoeba dubia 1.5  1010 bp Allium cepa The 3 genomic paradoxes: C-value paradox Lec 02 • The C-value is the DNA content of the haploid genome. C-value paradox: morphological complexity does not correlate with genome size. Slide 34

  9. Do you remember? Complexity does not correlate with genome size (C-value paradox). Organization and Structure of Genomes Lec 02 • How to measure genome complexity? • Hybridization kinetics. • Shear and melt DNA. • Allow to hybridize and measure ds vs. ss DNA by spectrophotometry. Hyperchrom effect:absorbance for single-stranded DNA will be 40% higher than that for double stranded DNA at the same concentration. • Cot½ -measures genome size and complexity. • The greater Cot½ value indicates slower reaction time at a given DNA concentration (longer to hybridize). Therefore k (reassociation rate) is smaller. • Longer to hybridize means that the genome is larger and the sequence is more unique. • Much of what we knew about genome size and complexity comes from these studies. Slide 35

  10. Organization and Structure of Genomes Lec 02 • In simpler organisms almost all of the DNA consist of unique sequences. • Rate of reassociation is inversely proportional to the length of the reassociating DNA. Reassociation kinetics of dsDNA from different simple organisms Slide 36

  11. Repetitions in complex genomes Lec 02 • Highly repeated DNA • R (repetition frequency) >100.000 • Almost no information, low complexity • Moderately repeated DNA • 10<R<10.000 • Little information, moderate complexity • “Single copy” DNA • R=1 or 2 • Much information, high complexity The low complexity may be preconditioned by strong inequality in nucleotide content (biased composition), by tandem or dispersed repeats or by palindrome-hairpin structures, as well as by a combination of all these factors. Slide 37

  12. Organization and Structure of Genomes Lec 02 Reassociation curve of labeled mRNA hybridized back to the genomic DNA. • Label mRNA and hybridize with excess of DNA, than measure formation of hybrids over time. • What the hell does this mean? • Most of mRNA is transcribed from non- repetitive DNA. • Moderately repetitive DNA is transcribed. • Cot½analysis shows that RNA doesnot hybridize with highly repetitive DNA, so highly repetitive DNA is probably not transcribed into mRNA. Slide 38

  13. Complexity and gene numbers Lec 02 • How much more complex is a human compared to a nematode?"What we mean by "biological complexity? • Some only considereddiversity of cell types, others considered brain circuitry, and others went as far as including the cultural achievements of the human species as a whole. • Simple number of genes can have little correlation to the "complexity" of the organism. Slide 38

  14. The 3 genomic paradoxes: N-value paradox Lec 02 • N-value paradox: morfological complexity does not correlate with gene number. ~21,000 genes ~25,000 genes ~60,000 genes Slide 39

  15. The 3 genomic paradoxes: K-value paradox Lec 02 • K-value paradox: Complexity does not correlate with chromosome number. Homo sapiens Lysandra atlantica Ophioglossum reticulatum 46 250 ~1260 Slide 40

  16. What is this highly repetitive DNA? Lec 02 • Selfish DNA? • Parasitic sequences that exist solelyto replicate themselves? • Or evolutionary relics? • Produced by recombination, duplication, unequal crossing over • Probably both • Transposons exemplify “selfish DNA”. • Crossing over and other forms of recombination lead to large scale duplications. Slide 41

  17. Main classes of repetitive DNA Lec 02 1. Interspersed repeats; 2. Processed pseudogenes; 3. Simple sequence repeats; 4. Segmental duplications; 5. Blocks of tandem repeats 1. Interspersed repeats • Interspersed repetitive DNA is found in all eukaryotic genomes. • The individual repeat units are dispersed at numerous locations in the genome. • Interspersed repeats (transposon-derived repeats) constitute ~45% of the human genome. They involve RNA intermediates (retroelements) or DNA intermediates (DNA transposons). • Reverse Transcription: • Long-terminal repeat transposons (LTRs; RNA-mediated). • Long interspersed elements (LINEs); these encode a reverse transcriptase. • Short interspersed elements (SINEs)(RNA-mediated); these include Alu repeats • Tranposase: • DNA transposons (3% of human genome) Slide 42

  18. Comparison of the age of interspersed repeats Lec 02 • Most interspersed repeats in human are ancient. There is no evidence of DNA transposon activity in the past 50 MY; thus they are extinct fossils. • Examples include retrotransposedgenes that lack introns,such as: • ADAM20 NM_003814 14q (original gene on 8p) • Cetn1 NM_004066 18p (original gene on Xq) • Glud2 NM_012084 Xq (original gene on 10q) • Pdha2 NM_005390 4q (original gene on Xp) Slide 43

  19. Pseudogenes Lec 02 • These genes have a stop codon or frameshift mutationand do not encode a functional protein. They commonly arise from retrotransposition, or following gene duplicationand subsequent gene loss.For a superb on-line resource: http://www.pseudogene.org Slide 44

  20. Pseudogenes Lec 02 • Gulo gene encodes the enzyme catalyzing the terminal step in vitamin C biosynthesis. Gulo has become a pseudogene in human and other primates. Slide 45

  21. Simple sequence repeats (SSR) Lec 02 • Microsatellites: from one to a dozen base pairs, for examples:(A)n, (CA)n, (CGG)n. These maybe formed by replication slippage. • Minisatellites: a dozen to 500 base pairs. Simple sequence repeats of a particular length andcomposition occur preferentially in different species. In humans, an expansion of triplet repeats such as CAGis associated with at least 14 disorders (includingHuntington’s disease). • All humans have the Huntington gene (HTT), which codes for the protein Huntington (HTT). Part of this gene is a repeated section called a trinucleotide repeat, which varies in length between individuals and may change length between generations. When the length of this repeated section reaches a certain threshold, it produces an altered form of the protein, called mutant huntingtin protein (mHTT). • Micro- and minisatellites comprise 3% of the genome. Replication slippage Slide 46

  22. Segmental duplications Lec 02 • These are blocks of about 1 kilobase to 300 kb that arecopied intra- or interchromosomally. • About 5% of the human genomeconsists of segmental duplications. • Duplicated regionsoften share very high (99%) sequence identity. Lipocalin geneson human chromosome Slide 47

  23. Blocks of tandem repeats Lec 02 • These include telomeric repeats (e.g. TTAGGG inhumans) and centromeric repeats (e.g. a 171 base pairrepeat of a satellite DNA in humans). Such repetitive DNA can span millions of base pairs, and it is often species-specific. Example of telomeric repeats (obtained bytblastn searching TTAGGG4 Slide 48

  24. Human genome overview Lec 02 Slide 49

  25. What is DNA sequencing? Lec 02 • Sequencing means to determine the primary structure of an unbranched biopolymer such as DNA, RNA or peptides. • DNA sequencing is the process of determining the nucleotide order of a given DNA fragment. Dideoxy sequencing method was devoloped by Fred Sanger in mid 1970. • Determining the sequence is therefore useful in fundamental research in medicine it can be used to identify, diagnose and potentially develop treatments for genetic diseases. Basics of the “old” technology Slide 50

  26. Sequencing genome Lec 02 Genome Project – status • Genome projects use two general approaches: • The mapping approach divides the genome into segments with genetic and physical mapping, refines the map of each segment, and finally sequences the DNA. • A “shotgun” approach breaks the genome into random, overlapping fragments, and sequences each fragment. Based on overlaps, the sequences are assembled by computer. An advantage is that physical mapping is not required. Slide 51

  27. Next-generation sequencing technologies Lec 02 • Compared to Sanger sequencing, next generation allows for sequencing of the complete genomic content of a sample without the need to make clone libraries. • One problem with next generation sequencing projects is the handling of massive amounts of sequencing data that must be organized, cleaned up, assembled, and analyzed. Slide 52

  28. Next-generation sequencing technologies Lec 02 • Template preparation: • Representative, non-biased source of nucleic acid. • Methods generally involve randomly breaking genomic DNA into smaller sizes. • Common adaptors are ligated to fragmented genomic DNA. • The template is attached or immobilized to a solid surface or support. The two most common methods are emulsion PCR (emPCR) and solid-phase amplification. • Amplified templates are required (enyzmatic extension with fluorescently tagged nucleotides). Peak detection: the identity of each base of a cluster is read off from sequential images. • Post processing, and sequence assembly 5/1 Slide 53

  29. Next-generation sequencing technologies Lec 02 Principles of pyrosequencing technology 5/2 • A sequencing primer is hybridized to a single-stranded PCR amplicon that serves as a template, and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase, and apyrase as well as the substrates, adenosine 5' phosphosulfate (APS), and luciferin. • DNA polymerase catalyzes the incorporation of the deoxyribo-nucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide. Slide 54

  30. Next-generation sequencing technologies Lec 02 Principles of pyrosequencing technology • ATP sulfurylase converts PPi to ATP in the presence of adenosine 5' phosphosulfate (APS). • This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. • The light produced in the luciferase-catalyzed reaction is detected by a charge coupled device (CCD) chip and seen as a peak in the raw data output (Pyrogram). • The height of each peak (light signal) is proportional to the number of nucleotides incorporated. Slide 55

  31. Next-generation sequencing technologies Lec 02 Principles of pyrosequencing technology 5/4 • Apyrase, a nucleotide-degrading enzyme, continuously degrades unincorporated nucleotides and ATP. When degradation is complete, another nucleotide is added. • Addition of dNTPs is performed sequentially. It should be noted that deoxyadenosine alfa-thio triphosphate (dATP·S) is used as a substitute for the natural deoxyadenosine triphosphate (dATP) since it is efficiently used by the DNA polymerase, but not recognized by the luciferase. • As the process continues, the complementary DNA strand is built up and the nucleotide sequence is determined from the signal peaks in the Pyrogram trace. Slide 56

  32. Comparison of existing methods Lec 02 Slide 57

  33. (c) Hypothesis 3: (b) Hypothesis 2: (a) Hypothesis 1: Semi-conservative replication Dispersive replication Conservative replication Models of DNA replication Lec 02 Alternative models of DNA replication The central dogma of biology Meselson–Stahl experiment Slide 58

  34. Discovery of the mechanisms in the biological synthesis of deoxyribonucleic acid” Lec 02 • Arthur Kornberg, 1959 Nobel Prize in Physiology or Medicine • Components are required: • dNTPs: dATP, dTTP, dGTP, dCTP • DNA template • DNA polymerase I (formerly the Kornberg enzyme) • (DNA polymerase II & III discovered soon after) • Mg 2+ (optimizes DNA polymerase activity) • Main features of the DNA synthesis reaction • DNA polymerase I catalyzes formation of phosphodiester bond between 3’-OH of the deoxyribose (on the last nucleotide) and the 5’-phosphate of the dNTP. Energy for this reaction is derived from the release of two of the three phosphates. • DNA polymerase I “finds” the correct complementary dNTP at each step in the lengthening process. Rate ≤ 800 dNTPs/second, low error rate. 3. Direction of synthesis is 5’ to 3’ Slide 59

  35. Not all polymerases are the same Lec 02 • Polymerase Polymerization (5’-3’) Exonuclease (3’-5’) Exonuclease (5’-3’) #Copies • I Yes Yes Yes 400 • II Yes Yes No ? • III Yes Yes No 10-20 • Polymerase I & III replicate 5’ to 3’ • Polymerase II’s role is not well understood • 3’ to 5’ exonuclease activity = ability to remove nucleotides from the 3’ end of the chain • Important proofreading ability • Without proofreading error rate (mutation rate) is 1 x 10-6 • With proofreading error rate is 1 x 10-9 (1000-fold decrease) • 5’ to 3’ exonuclease activity functions in DNA replication & repair Slide 60

  36. 5’ 3’ 3’ 5’ 5’ 5’ 3’ 3’ 5’ Single strand binding proteins 3’ 5’ RNA Primers DNA Polymerase Helicase 5’ The Replication Fork Lec 02 Primase Laging Strand Okazaki fragment Leading Strand Slide 61

  37. PCR history Lec 02 • In 1969, Thomas Brock and Hudson Freeze reported a new species of thermophilic bacterium which they named Thermus aquaticus. • The Polymerase Chain Reaction (PCR) was not a discovery, but rather an invention. • A special DNA polymerase (Taq) is used to make many copies of a short length of DNA (100-10,000 bp) defined by primers. • Kary Mullis, the inventor of PCR, was awarded the 1993Nobel Prize in Chemistry. Kary Mullis • Taq DNA polymerase has an error rate of one in 10,000 nucleotides. It can amplify a 1kb strand of DNA in roughly 30 seconds at 72°C. • Pfu DNA polymerase is an enzyme found in the hyperthermophilic archae on Pyrococcus furiosus, Pfu typically results in an error rate of 1 in 1.3 million base pairs. Slide 62

  38. What PCR Can Do? Lec 02 • PCR can be used to make many copies of any DNA that is supplied as a template. • Starting with one original copy an almost infinite number of copies can be made using PCR. • “Amplified” fragments of DNA can be sequenced, cloned, probed or sized using electrophoresis. • Defective genes can be amplified to diagnose any number of illnesses. • Genes from pathogens can be amplified to identify them (ie. HIV). • Amplified fragments can act as genetic fingerprints. How PCR Works? • PCR is an artificial way of doing DNA replication. • Instead of replicating all the DNA present, only a small segment is replicated, but this small segment is replicated many times. • As in replication, PCR involves: • Melting DNA • Priming • Polymerization Slide 63

  39. 30x 100 Melting 94 oC Extension 72 oC Annealing Primers 50 Temperature 50 oC 0 T i m e 5’ 3’ 5’ 3’ 5’ 3’ 5’ 5’ 3’ 3’ 5’ 3’ 5’ 5’ 3’ 5’ 3’ 5’ PCR is an articial DNA replication Lec 02 Melting 94 oC 5’ 3’ 5’ 5’ 5’ 5’ 3’ 5’ Slide 64

  40. Number 1 64 2 16 32 4 8 0 Cycles 1 2 3 4 5 6 DNA doubles with each thermal cycle Lec 02 Slide 64

  41. Theoretical yield of PCR Lec 02 • Theoretical yield = 2n x y • Where y = the starting number of copies and n = the number of thermal cycles Number of cycles 0 10 15 20 25 30 M • If you start with 100 copies, how many copies are made in 30 cycles? 2n x y = 230 x 100 = 1,073,741,824 x 100 Slide 65

  42. Primer design Lec 02 • There are several excellent sites for designing PCR primers. Carry out a simple text search. Slide 66

  43. Primer design Lec 02 • Click on “Online Analysis Tools – PCR” • - Slide 67

  44. Primer design Lec 02 • Choose Primer3 WWW primer tool. Slide 68

  45. Primer design Lec 02 • Choose act 1 Slide 69

  46. Primer design Lec 02 • Go to reference sequence details Slide 70

  47. Primer design Lec 02 Slide 71

  48. Contig3_Frame-2_ORF17

  49. Contig3_Frame-2_ORF17

More Related