1 / 8

Splicing Enhancer Scoring in ISCAN

Splicing Enhancer Scoring in ISCAN. Overview. Method SC35 splicing enhancer WMM (SELEX experiments by M. Zhang & A. Krainer, CHSL) 6-base long pattern Training set “Clean” set of 6061 Refseq genes (no alternative splicing) Results WMM hits in introns tend to overlap by 2 and 4 bases

pravat
Download Presentation

Splicing Enhancer Scoring in ISCAN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Splicing Enhancer Scoring in ISCAN

  2. Overview • Method • SC35 splicing enhancer WMM (SELEX experiments by M. Zhang & A. Krainer, CHSL) • 6-base long pattern • Training set • “Clean” set of 6061 Refseq genes (no alternative splicing) • Results • WMM hits in introns tend to overlap by 2 and 4 bases • Hits in exons often overlap by 3 bases

  3. Deriving Splicing Enhancer Score • Let • n2 = count of 2-base or 4-base overlaps, • n3 = count of 3-base overlaps, • z = n2 – n3 • Derive distribution P(z) from the training set • Use randomized sequences with the same length distribution to derive the null model • Compute the score S = 10*log2(P(z)/Pnull(z)) • Requires explicit intron length model

  4. Explicit Intron Length Model X ● ● ● -∞ k-L k • Some positions have max exon score of –∞ • Can skip introns that end at such positions • If DNA sequence is masked, the number of • iterations needed is O(1000) for L ~ 105 • Run time for 1MB is of the order of 10 hours

  5. Splicing Enhancer Model • Use 2-base overlap and 3-base overlap counts to define z • Explicit intron length up to 50 kb • z > 0: • P(z) = a/(z2+bz+c); Pnull(z) = aebz (b > 0) • z < 0: • P(z) = ae-bz; Pnull(z) = ce-dz (b, d > 0)

  6. Chromosome 22

  7. Chromosome 20

  8. What next? • Use “clean” RefSeq set to re-estimate parameters • Optimize splicing enhancer model • 4-base overlap counts instead of 2-base (or both) • Better null model • Other motifs • Apply pseudogene filtering to increase specificity

More Related