1 / 24

The Barcode of Life Data Portal ( bol.uvm)

The Barcode of Life Data Portal ( http://bol.uvm.edu). Dr. David E Schindel, Executive Secretary Michael Trizna, Database Specialist Consortium for the Barcode of Life (CBOL) Smithsonian Institution Washington, DC www.barcodeoflife.org; SchindelD@si.edu and TriznaM@si.edu.

adah
Download Presentation

The Barcode of Life Data Portal ( bol.uvm)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Barcode of LifeData Portal(http://bol.uvm.edu) Dr. David E Schindel, Executive Secretary Michael Trizna, Database Specialist Consortium for the Barcode of Life (CBOL) Smithsonian Institution Washington, DC www.barcodeoflife.org; SchindelD@si.edu and TriznaM@si.edu

  2. Contents of Presentation • Crowd-sourced open source software • How does Data Portal complement BOLD and GenBank? • Data Portal capabilities • Case Study: Smithsonian frozen bird tissue project

  3. An Experiment in Museum Tissue Mining and Fast Data Release • Tissue sampling winter/spring • Sequencing completed in September • Sequence quality control in October • Taxonomic checking in early November • Obvious errors removed • Minor discrepancies remain • Data released for Adelaide Conference • Crowd-sourced annotation by community • Will data be mis-used?

  4. Unique Data Portal Capabilities • Creating customized datasets from public and/or your private data • Online library of standard datasets • Support sharing within project teams using Connect IDs, easy link to Working Groups • Running different identification analyses based on different methodologies: • Standard sequence input using FASTA format • Use standard or customized datasets

  5. Barcode Aggregator 727,170 public records

  6. Summary Statistics per Family

  7. Creating Customized Datasets

  8. Existing Data Analysis Packages • LIST of packages • BLOG • BRONX • Kernel • CAOS • USEARCH • BLAST • Output of identification routines as probabilities of assignment

  9. Data Analysis Methods Session • New packages presented Friday afternoon: • Damon Little: Automatic Plants Barcode pipeline (from raw traces to trimmed/edited sequences) • Ka Hou Chu: Composite Vector Method (profile trees for faster alignment and tree-based analysis) • Alain Franc: Matching Next Generation results to Sanger-based reference records

  10. Sample output

  11. CONNECT for Data Portal Collaboration

  12. The USNM Bird Project • USNM Division of Birds frozen tissue collection: • 21,104 specimens, 2512 species • Which new ones onesto sample/barcode? • Public records for birds • All public bird COI records: 10,967 • All BARCODE records in GenBank: 8,419 • BARCODE with taxonomic names: 7,965 • BARCODE, name and 2 traces: 2,388

  13. Moving Data Among BOLD, GenBank, Data Portal USNM Excel Spreadsheet (KE-Emu Source) BOLD Split into projects that consist of 2-4 plates Localdatabase that holds all fields from the original spreadsheet Data Portal Aggregator database

  14. Creating a ‘Pick List’ • Spreadsheet of tissue samples compared with: • ITIS taxonomy • Clemens species list in BOLD • Counts of GenBank and/or public BOLD records • Geographic informattion • Screenshot of USNM list side-by-side with BOLD records

  15. Identifying Samples to be Subsampled

  16. Side-by-Side Lists

  17. USNM Bird Dataset • 3150 tissues sampled • 168 failed sequences • 94 problematic sequences • 166 clustered badly • 2761 ‘BARCODE-ready’ samples • 1,147 ‘first-BARCODE’ species • 91% increase over 1,259 barcoded species • (3,892 listed in BOLD includes BINs, others)

  18. Two problematic clades, USNM data • Flycatchers: Family Tyrannidae • Sublegatusarenarum, S. modestus, S. obscurior, S. sp. • Conopiasparvus, C. albovittatus • Myiarchusferox, M. swainsoni, M. sp. • Hummingbirds: Family Trochilidae • Phaethornislonguemareus • Inconsistencies within USNM dataset • Incompatibilities with public, other data

  19. Resolving Mis-identified Specimens

  20. What testing dataset to use? • ID trees and analytical routines could use: • All public bird COI records: 10,967 • All BARCODE records in GenBank: 8,419 • BARCODE with taxonomic names: 7,965 • BARCODE, name and 2 traces: 2,388 • Which ones have reliable taxonomic IDs?

  21. Preparing a Data Release Paper • Summary statistics from Data Portal • Figures from BOLD

More Related