380 likes | 484 Views
Overview of HDF5. HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006. Topics. What is HDF? Sample uses of HDF THG the Company. What is HDF?. Matter & the universe. Life and nature. Weather and climate. August 24, 2001. August 24, 2002. Total Column Ozone (Dobson).
E N D
Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006
Topics • What is HDF? • Sample uses of HDF • THG the Company
Matter & the universe Life and nature Weather and climate August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Answering big questions …
varied data… caacaagccaaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtgg cgagatatctcttggaaaaactttcaagagcaactcaatcaactttctcgagcattgctt gctcacaatattgacgtacaagataaaatcgccatttttgcccataatatggaacgttgg gttgttcatgaaactttcggtatcaaagatggtttaatgaccactgttcacgcaacgact acaatcgttgacattgcgaccttacaaattcgagcaatcacagtgcctatttacgcaacc aatacagcccagcaagcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtc ggcgatcaagagcaatacgatcaaacattggaaattgctcatcattgtccaaaattacaa aaaattgtagcaatgaaatccaccattcaattacaacaagatcctctttcttgcacttgg
Contig Summaries Discrepancies Contig Qualities Coverage Depth and complex relationships… SNP Score Trace Reads Aligned bases Read quality Contig Percent match
HDF How do we… • Describe the data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories
HDF is • A file format for managing any kind of data • Software to store and access data in the format • Suited especially to large or complex data collections • Suited for every size of system • Platform independent – runs almost anywhere • Open – both file formats and software
Efficient storage, I/O Scientific data file format CommonData models I/O software & tools StandardAPIs HDF solution
palette An HDF file is a container… …into which you can put your data objects. lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6
“/” (root) “/foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array HDF structures for organizing objects in files
HDF5 Software Tools & Applications HDF I/O Library HDF File
Goals of HDF5 Library • Flexible API to support a wide range of operations on data • High performance access in serial and parallel computing environments • Compatibility with common data models and programming languages
Features • Ability to create complex data structures • Complex subsetting • Efficient storage • Flexible I/O (parallel, remote, etc.) • Ability to transform data during I/O • Support for key language models • OO compatible • C & Fortran primarily • Also Java, C++
Aqua (6/01) Terra CERES MISR MODIS MOPITT AquaCERES MODIS AMSR Aura TES HRDLS MLS OMI HDF-EOS 1. NASA Earth Observing System (EOS)
2. Advanced Simulation & Computing (ASC) Question: How do we maintain a nuclear stockpile in the absence of testing?
ASC Data requirements • Large datasets (> a terabyte) • Good I/O performance on massive parallel systems Complex data and extensive metadata
3. Bioinformatics--Managing genomic data caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat
DNA sequencing workflows • Diverse formats • Highly redundant data • Repeated file processing • Disconnected programs • Non-scalable storage • Lack of persistence
Contig Summaries Discrepancies Contig Qualities Coverage Depth Multiple levels and relationships SNP Score Trace Reads Aligned bases Read quality Contig Percent match
BioHDF HDF5 as binary format for bioinformatics
HDF- Time-history HDF- PACKET 3. Boeing flight test
Flight test data requirements • Fast data acquisition from 1000s of sources • Wide variety of data types • Active archive • Standardization for data/software exchange • Special features
What is the HDF Group? • 18 years at National Center for Supercomputing Center (NCSA) at University of Illinois • Recent spin-off U of I • Non-profit 501(c)(3) • 17 scientific, technology, and professional staff • 5 students • 2+million product users world-wide • Cross industry sectors and disciplines
THG missionTo support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.
Business model • Non-profit: mission driven • Intellectual property: • U of I plans to assign ownership to THG • The HDF formats will remain free, and HDF software will remain open source. • Continue close ties to U of I and NCSA.
Income-generating activities • Major client support • Targeted HDF development • Grant-supported R&D • Consulting
HDF Information • HDF Information Center • http://hdfgroup.org/ • HDF Help email address • hdfhelp@hdfgroup.org/ • HDF users mailing list • hdfnews@hdfgroup.org/