150 likes | 269 Views
Introduction to Computational Genomics: a case study approach. CHAPTER 2 Gene Finding. OUTLINE. An introduction to genes and proteins Gene finding Hypothesis testing. GENES. Segment that specifies the sequence of a protein Exons = coding sequences Introns = non-coding sequences
E N D
Introduction to Computational Genomics:a case study approach CHAPTER 2 Gene Finding
OUTLINE • An introduction to genes and proteins • Gene finding • Hypothesis testing
GENES • Segment that specifies the sequence of a protein • Exons = coding sequences • Introns = non-coding sequences • Occupies a specific location on a chromosome (an organized strand of DNA)
PROTEINS • Used in enzymes and as structural materials in cells • Chain of Amino Acid (AA) • Shape determines its function (protein folding)
AA ALPHABET A = {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V}
OPEN READING FRAME • Start condon (ATG = Methionine) • Non-stop condons • Stop condons (TGA, TAA, TAG)
GENE FINDING • Methods: • ab initio • homology based methods • Only prokaryotic genes consist of single continuous ORFs • Algorithm
LOWER BOUND • Uniform condon distribution • P(run of k non-stop condons) = (61/64)k • Non-uniform condon distribution • P(stop) = P(TAA) + P(TAG) + P(TGA) • P( run of k non-stop condons) = [1 – P(stop)]k
DEFINITIONS • Significance level • Test statistic • P-value • Types of errors • Type I error (false positive) • Type II error (false negative)
HYPOTHESIS TESTING • Distinguish reliable patterns from background noise • Probability under null model • Significant when highly unlikely under null model
RANDOMIZATION TEST • Cannot easily calculate p-value • Randomization of observed data • Same statistical properties • Permutation • Bootstrapping