430 likes | 541 Views
CS177 Lecture 7 Computational Aspects of Protein Structure II. Tom Madej 10.25.04. Research news ( Nature 10.21.04). Another milestone for the Human Genome Project. Fills in approx. 99% of the “gene rich” portion of the genome (10% more than the 2001 drafts).
E N D
CS177 Lecture 7Computational Aspects of Protein Structure II Tom Madej 10.25.04
Research news (Nature 10.21.04) • Another milestone for the Human Genome Project. • Fills in approx. 99% of the “gene rich” portion of the genome (10% more than the 2001 drafts). • Only 341 remaining gaps, formerly hundreds of thousands. • New estimate of the number of genes: 20,000-25,000. • Megabase deletions result in viable mice! • Researchers deleted 1.5 Mb and 0.8 Mb portions of the mouse genome, non-coding regions, and the mice seem to be fine!
Example for last homework • I searched “Structure” with the term “Leukemia”. • The first structure was 1uc6A. I noticed a couple of Vast neighbors with low percent sequence identity but very similar folds, 1uemA (17.4%), 1uenA (13.7%). • I ran PSI-BLAST with query sequence 1uc6A. The CD Search got a hit to “Fibronectin type 3”. 1uemA and 1uenA are also assigned to FN3, but for some reason 1uc6 is not (???). • I got lucky, 1uemA and 1uenA were found by PSI-BLAST but did not cross the significance threshold prior to convergence!
Overview of lecture • Protein structure • General principles • Structure hierarchy • Supersecondary structures • Superfolds and examples: TIM barrels, OB fold • Protein structure comparison algorithms • VAST (Vector Alignment Search Tool) • CE (Combinatorial Extension) • Protein fold classification databases • SCOP (Structural Classification of Proteins) • CATH (Class, Architecture, Topology, Homologous superfamily)
General principles • Most protein structures are composed of two types of regular structural elements interconnected by less well-structured regions. • Regular secondary structure elements (SSEs): α-helices and β-strands. • Irregular regions: loops or coil. • A pair of SSEs positioned next to each other in space may be parallel or anti-parallel.
General principles (cont.) • Helices are stabilized by “internal” hydrogen bonds. • Hydrogen bonds will form between an adjacent pair of strands. • Strands will form larger structures such as β-sheets or β-barrels. • Due to the residue side chains, there are favored packing angles between helices/helices, helices/sheets, and sheets/sheets.
Examples of protein architecture β-sheet with all pairs of strands parallel Architecture refers to the arrangement and orientation of SSEs, but not to the connectivity. β-sheet with all pairs of strands anti-parallel
Examples of protein topology Topology refers to the manner in which the SSEs are connected. Two β-sheets (all parallel) with different topologies.
Exercise • Take a look at 1r7sA in Cn3D. • Draw a topology diagram showing the way the strands are connected.
Angles between SSEs in contact • The data on the next 3 slides gives the cosine of angles between a pair of SSE vectors. • The SSE’s were required to be “in contact”, i.e. within 10 Å of each other. • Note: The SSEs are not necessarily consecutive in the sequence!
Examples of structures formed by β-strands • Triosphosphate isomerase 7timA • Retinol binding protein 1rbp • Porin 1oh2P
Higher level organization • A single protein may consist of multiple domains. Examples: 1liy A, 1bgc A. The domains may or may not perform different functions. • Proteins may form higher-level assemblies. Useful for complicated biochemical processes that require several steps, e.g. processing/synthesis of a molecule. Example: 1l1o chains A, B, C.
Example: Replication Protein A RPA binds to ssDNA, is involved in recombination, replication, and repair. It is a heterotrimer, consisting of three subunit proteins that bind together. See structure 1l1o. E. Bochkareva et al. The EMBO Journal (2002) 21 1855-1863
Supersecondary structures • β-hairpin • α-hairpin • βαβ-unit • β4 Greek key • βα Greek key
Supersecondary structure: simple units G.M. Salem et al. J. Mol. Biol. (1999) 287 969-981
Supersecondary structure: Greek key motifs G.M. Salem et al. J. Mol. Biol. (1999) 287 969-981
Examples of β4 Greek key motif • 1hk0 Human Gamma-D Crystallin; residues 32 thru 64 in domain 1. • OB fold (we’ll see this fold later).
Examples of βα Greek key motif • 1bgw Topoisomerase; residues 487 thru 540 in domain 5. • 1ris Ribosomal protein S6.
Protein folds • There is a continuum of similarity! • Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing. • Fold classification: To get an idea of the variety of different folds, one must adjust for sequence redundancy and also try to correctly assign homologs that have low sequence identity (e.g. below 25%).
Superfolds (Orengo, Jones, Thornton) • Distribution of fold types is highly non-uniform. • There are about 10 types of folds, the superfolds, to which about 30% of the other folds are similar. There are many examples of “isolated” fold types. • Superfolds are characterized by a wide range of sequence diversity and spanning a range of non-similar functions. • It is a research question as to the evolutionary relationships of the superfolds, i.e. do they arise by divergent or convergent evolution?
Globin 1hlm sea cucumber hemoglobin; 1cpcA phycocyanin; 1colA colicin α-up-down 2hmqA hemerythrin; 256bA cytochrome B562; 1lpe apolipoprotein E3 Trefoil 1i1b interleukin-1β; 1aaiB ricin; 1tie erythrina trypsin inhibitor TIM barrel 1timA triosephosphate isomerase; 1ald aldolase; 5rubA rubisco OB fold 1quqA replication protein A 32kDa subunit; 1mjc major cold-shock protein; 1bcpD pertussis toxin S5 subunit α/β doubly-wound 5p21 Ras p21; 4fxn flavodoxin; 3chy CheY Immunoglobulin 2rhe Bence-Jones protein; 2cd4 CD4; 1ten tenascin UB αβ roll 1ubq ubiquitin; 1fxiA ferredoxin; 1pgx protein G Jelly roll 2stv tobacco necrosis virus; 1tnfA tumor necrosis factor; 2ltnA pea lectin Plaitfold (Split αβ sandwich) 1aps acylphosphatase; 1fxd ferredoxin; 2hpr histidine-containing phosphocarrier Superfolds and examples
TIM barrels • Classified into 21 families in the CATH database. • Mostly enzymes, but participate in a diverse collection of different biochemical reactions. • There are intriguing common features across the families, e.g. the active site is always located at the C-terminal end of the barrel.
TIM barrel evolutionary relationships(Nagano, Orengo, Thornton) • Sequence analysis with advanced programs such as PSI-BLAST and IMPALA have identified further relationships among the families. • Further interesting similarities observed from careful comparison of structures, e.g. a phosphate binding site commonly formed by loops 7, 8 and a small helix. • In summary, there is evidence for evolutionary relationships between 17 of the 21 families.
OB (oligonucleotide/oligosaccharide-binding) fold • 5-stranded β-barrel with Greek key topology. • All OB folds have the same binding face that is involved in their biochemistry.
OB evolutionary relationships • SCOP lists 9 superfamilies. • Bacterial enterotoxin superfamily consists of two families, almost certainly evolutionarily related. • Nucleic acid-binding superfamily has 11 families, if evolutionarily related the ancestral protein would come from the LUCA (Last Universal Common Ancestor). • Evidence for common ancestry of all OB folds is probably weaker than for TIM barrels.
Protein structure comparison • How to compare 3D protein structures? • Analogous computational considerations to sequence comparison, e.g. accuracy, efficiency for database searches, statistical significance of results, etc. • Additional complication: working with atomic coordinates in 3D space!
Some protein structure comparison methods • VAST (Vector Alignment Search Tool, NCBI) • CE (Combinatorial Extension, RCSB/PDB) • DALI (EBI)
VAST outline • Parse protein structures into SSEs (helices and strands). • Fit vectors to SSEs. • To compare a pair of proteins attempt to superpose as many vectors as possible, subject to constraints. • Evaluate the vector alignment for statistical significance( computer an E-value). • If the vector alignment is significant then proceed to a more detailed residue-to-residue alignment (“refined alignment”).
Two protein with vectors assigned to SSEs 3chy 1ipf A
VAST comparison of 3chy and 1ipfA Vector superposition Refined alignment
SCOP (Structural Classification of Proteins) • http://scop.mrc-lmb.cam.ac.uk/scop/ • Levels of the SCOP hierarchy: • Family: clear evolutionary relationship • Superfamily: probable common evolutionary origin • Fold: major structural similarity
CATH (Class, Architecture, Topology, Homologous superfamily) • http://www.biochem.ucl.ac.uk/bsm/cath/