1 / 12

Sequence and structure databanks

Sequence and structure databanks. can be divided into many different categories. One of the most important is. Supervised databanks with gatekeeper. Examples: Swissprot Refseq (at NCBI) Entries are checked for accuracy. + more reliable annotations -- frequently out of date.

Download Presentation

Sequence and structure databanks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence and structure databanks can be divided into many different categories. One of the most important is Supervised databanks with gatekeeper. Examples: Swissprot Refseq (at NCBI) Entries are checked for accuracy. + more reliable annotations -- frequently out of date Repositories without gatekeeper. Examples: GenBank EMBL TrEMBL Everything is accepted + everything is available -- many duplicates -- poor reliability of annotations

  2. Description of Group B Streptococcus Pan-genome Genome comparisons of 8 closely related GBS strains Tettelin, Fraser et al., PNAS 2005 Sep 27;102(39)

  3. Method

  4. Bacterial Core Genes that are shared among all Bacteria Bit score cutoff 50.0 (~10E-4) f(x) = A1*exp(-K1*x) + A2*exp(-K2*x) + A3*exp(-K3*x) + Plateau

  5. Genes without homologs f(x) = A1*exp(-K1*x) + A2*exp(-K2*x) + A3*exp(-K3*x) + A4*exp(-K4*x) + A5*exp(-K5*x) + Plateau

  6. Decomposed function

  7. Core Extended Core Accessory Pool Genes that can be used to distinguish strains or serotypes (Mostly genes of unknown functions) Essential genes (Replication, energy, homeostasis) Set of genes that define groups or species (Symbiosis, photosynthesis) ~ 116 gene families ~ 17,060 gene families ~ 114,800 gene Families uncovered so far

  8. Gene frequency in individual genomes Accessory Pool Extended Core 19.6% 76.6% 3.8% Core

  9. f(x) = A1*exp(-k1*x) + A2*exp(-k2*x) + A3*exp(-k3*x) + A4*exp(-k4*x) + A5*exp(-k5*x) + Plateau A1 = 939.4 A2 = 731.1 A3 = 455.2 A4 = 328.6 A5 = 385.5 1/k1 = 0.48 1/k2 = 2.3 1/k3 = 10.16 1/k4 = 31.40 1/k5 = 162.6 Number of genomes added

  10. Kézdy-Swinbourne Plot Novel genes after looking in x + ∆xgenomes ~230 novel genes per genome Novel genes after looking in x genomes

  11. A Kézdy-Swinbourne Plot plot can be used to estimate the value that a decay function approaches as time goes to infinity. Assume the simple decay function f(x) = K + A e-kx , then f(x + ∆x) = K + A e-k(x+∆x). Through elimination of A: f(x+∆x)=e-k ∆x f(x) + K’ For the plot of f(x+∆x) against f(x) the slope is e-k ∆x. For x both f(x) and f(x+∆x) approach the same constant : f(x)K, f(x+∆x)K. (see the def. for the decay function) The Kézdy-Swinbourne Plot is rather insensitive to deviations from a simple single component decay function. More at Hiromi K: Kinetics of Fast Enzyme Reactions. New York: Halsted Press (Wiley); 1979

  12. Kézdy-Swinbourne Plot Novel genes after looking in x + ∆xgenomes ~230 novel genes per genome Novel genes after looking in x genomes

More Related