1 / 12

Presented by: Xia Li

SeqMap : mapping massive amount of oligonucleotides to the genome Hui Jiang et al. Bioinformatics (2008) 24: 2395-2396 The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing Nathan Clement et al. Bioinformatics (2010) 26: 38-45 .

alaina
Download Presentation

Presented by: Xia Li

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SeqMap: mapping massive amount of oligonucleotides to the genomeHui Jiang et al. Bioinformatics (2008) 24: 2395-2396The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencingNathan Clement et al. Bioinformatics (2010) 26: 38-45 Presented by: Xia Li

  2. Short-read mapping software

  3. SeqMap • Motivation • Hashing genome usually needs large memory (e.g. SOAP needs 14GB memory when mapping to the human genome) • Allow more substitutions and insertion/deletion

  4. SeqMap Short Read • Pigeonhole principle • Spaced seed alignment • ELAND, SOAP, RMAP • Hash reads • Insertion/deletion: 2/4 combinations with 1/2 shifted one nucleotide to its left or right Split into 4 parts All combinations of 2/4 parts Short read look up table (indexed by 2 parts) Reference Genome Image credit: J. Ruan

  5. Experiment & Result

  6. Experiment & Result • Deal with more substitutions and insertion/deletion Randomly generate a DNA sequence of a length of 1Mb, add 100Kb random substitutions, N’s and insertion/deletions

  7. GNUMAP • Motivation • Base uncertainty • Such as nearly equal or low probabilities to A, C, G or T • Filter low quality reads [RMAP] -> discard up to half of the reads (Harismendyet al., 2009) • Repeated regions in the genome • Discard them -> loss of up to half of the data (Harismendyet al., 2009) • Record one -> unequal mapping to some of the repeat regions • Record all -> each location having 3 times the correct score

  8. GNUMAP • Flow-chart

  9. Probabilistic Needleman-Wunsch

  10. Alignment Score Read from sequencer GGGTACAACCATTAC Read is added to both repeat regions proportionally to their match quality weighted by its # of occurrences in the genome AACCAT GGGTAC AACCAT ACTGAACCATACGGGTACTGAACCATGAA Slide credit: N. Clement

  11. Experiment & Result

  12. Comments • SeqMap • Pos: dealing with more substations/insertion/deletion • Cons: memory consuming, not fast • GNUMAP • Pos: consider base quality and repeated regions -> generate more useful information and achieves best performance (~15% increase) • Cos: memory consuming, slow, more noise

More Related