1 / 14

De novo assembly from Illumina

alina
Download Presentation

De novo assembly from Illumina

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. De novo assembly from Illumina/SOLEXA short reads: Assemblers and trends By: Urmi Trivedi The Gene Pool University of Edinburgh

    2. Contents Challenge of assembling short reads De novo assembly at The Gene Pool using several assemblers available Comparison between short read assemblers Assembly quality criteria Factors affecting assembly Concluding Remarks

    3. De novo assembly of micro reads - A challenge Large amount of data but demands new analytical methods Micro-reads assembly is challenging Large amount of computing power to calculate the overlaps Quality issues and hence chances of ambiguous assembly

    4. Assemblers so far.. Velvet Edena SSAKE SHARCGS SHRAP ALLPATHS EULER-SR

    5. De novo assembly at The Gene Pool using several assemblers Single end data is generated for several genomes Listeria monocytogenes Rhodococcus equi Campylobacter jejuni Magnetospirillum magneticum (AMB-1) Streptomyces cattleya Photorhabdus temparata Mainly used assemblers: Velvet Edena SSAKE

    7. Measures of assembly N50 Largest contig formed % bases in contigs >= 1KB Total bases in contigs Any other suggestions are WELCOME ?

    8. De novo assembly of Listeria monocytogenes (~2.9 MB) with Edena

    9. De novo assembly of Streptomyces cattleya (~8.9 MB) with Edena

    10. Factors affecting assembly Varying assemblies from organism to organism Biology along with technology Several factors may be affecting assembly: Coverage % GC Genomic content (repetitive regions, transposons, etc.)

    11. Effect of Coverage Coverage = N*36/G N=Total Number of Reads G=Genome Size Higher the coverage better the assembly Some cases differ Responsible factors ?? Possibly genomic content

    12. %GC and %Base content in contigs >=1KB Higher the GC content poorer the assembly Again, some cases differ Why ???

    13. Limitations How far can you go with unpaired data? – A fundamental limitation Paired ends (ALLPATHS –upcoming assembler) Combined approach with longer reads (Sanger or 454) Raising Coverage (~80X)

    14. Concluding Remarks Surprising to get such long contigs from such short reads! De novo assembly from short reads was thought as impossible but the number of papers published for the same suggests otherwise. Something will come up which will do the job or perhaps any of us here may be “The One” ?

More Related