1 / 22

RNA Sequence Assembly

RNA Sequence Assembly. WEI Xueliang. Overview. Sequence Assembly Current Method My Method RNA Assembly To Do. Sequence Assembly. Goal : get the DNA/RNA sequence. Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. Define: Read = Tag = Fragment.

marina
Download Presentation

RNA Sequence Assembly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA Sequence Assembly WEI Xueliang

  2. Overview • Sequence Assembly • Current Method • My Method • RNA Assembly • To Do

  3. Sequence Assembly • Goal : get the DNA/RNA sequence. • Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. • Define: Read = Tag = Fragment

  4. De novo sequence assembly

  5. Overview • Sequence Assembly • Current Method • My Method • RNA Assembly • To Do

  6. De novo sequence assembly • Calculating the overlap need huge amount of time.

  7. DE BRUIJN GRAPH • K-Mer : Length k substring of the Tag. • Each nodes only have 4 out degrees at most. • Hashing the node. • “CTG”=>(132)4=(30)10 • “CTG”=>”TGG” • (132=)4 shift left. • (1320)4 module (1000)4 • (320)4 + (3)4 ‘G’ • (323)4

  8. DE BRUIJN GRAPH (CONT’) • If there are repeats, like ”GACT” • 3-Mer De Bruijn can not know which way is the correct way. 6-Mer can get the correct sequence. • Larger K, better result.

  9. De novo sequence assembly • Suppose use K = Length of Tag. (20-Mer) • TGACGTAGCTATGTATTTTG • GACGTAGCTATGTATTTTGT (no 20-Mer) • Coverage is not enough to support large K.

  10. Overview • Sequence Assembly • Current Method • My Method • RNA Assembly • To Do

  11. MY METHOD. • Tag length=6, K=3 • When we have • AAGACT? • Try all the way: • AAGACTC • AAGACTT • AAGACTG • Check Tag : • AGACTC • The correct way should be AAGACTC

  12. Overview • Sequence Assembly • Current Method • My Method • RNA Assembly • To Do

  13. RNA ASSEMBLY

  14. ALTERNATIVE SPLICING • The graph • All cDNA sequences.

  15. RNA ASSEMBLY’S PROBLEM • Merge? • Index the sequence.

  16. RNA ASSEMBLY’S PROBLEM(CONT’) • Solution?

  17. RNA ASSEMBLY’S PROBLEM(CONT’) • Index Tags

  18. RNA ASSEMBLY’S PROBLEM(CONT’) • Solution? • Speed?

  19. SINGLE TAG’S LIMITATION • |Yellow Sequence| >= Length of Tag • Length of Tag 25-100bp. • Single Tag is not enough!

  20. DATASET - PAIRED END TAGS • Fragment length usually > 1k • Some RNA sequence is shorter than 1k.

  21. TO DO • Handle large data-sets. (10G) • Improve accuracy. • Using PETs data.

  22. Thanks!!

More Related