390 likes | 754 Views
Secrets of Unidata Software Engineers. Russ Rew UCAR Software Engineering Assembly April 26, 2006. Unidata in a Nutshell. Mission: To provide data, tools and community leadership for enhanced Earth-system education and research The Unidata Program Center:
E N D
Secrets of Unidata Software Engineers • Russ Rew • UCAR Software Engineering Assembly • April 26, 2006
Unidata in a Nutshell • Mission: • To provide data, tools and community leadership for enhanced Earth-system education and research • The Unidata Program Center: • Facilitates (real-time) data access • Provides and supports data access, analysis, and visualization tools and services • Builds and advocates for a community of geoscience educators and researchers • UPC size: 12 developers, 12 other staff
Unidata Developers • Jeff McWhirter • Don Murray (.75) • Jen Oxelson • Russ Rew (.25) • Anne Wilson • Tom Yoksas (.75) • Tom Baltzer • John Caron (.75) • Steve Chiswell • Ethan Davis • Steve Emmerson • Ed Hartnett (.25) • Yuan Ho • Robb Kambic
Overview: The Mystery • Premise: Unidata has been very successful in its software development • Premise: Unidata’s software engineering process appears haphazard and chaotic • Mystery: Why is Unidata’s software successful and popular when it makes little use of recognized development methodologies? • Speculations, theories, and revelations
Some Software Successes • Integrated Data Viewer (IDV) • Local Data Manager (LDM) • netCDF, netCDF Java (nj22) • THREDDS and THREDDS Data Server (TDS) • Units library (udunits)
IDV • Unidata’s newest scientific analysis and visualization tool • Freely available 100% Java framework and reference application • Provides 2- and 3-D displays of geoscience data • Stand-alone or networked application • Integrates data from disparate sources • End-to-end test for Unidata technologies
IDV’s Success • In use at over 80 Unidata sites and use growing rapidly • Selected as the visualization tool for the Operations Center in T-REX • Bill Hibbard, developer of Vis5D and VisAD, calls the IDV “far better than any other environmental visualization system”
LDM • Peer-to-peer system for reliable, event-driven data distribution using LDM-6 software • Supports subscriptions to near real-time data feeds • LDM protocols use persistent TCP connections, suitable for pushing a large number of small products, as well as large products • Highly configurable: can inject, distribute, capture, filter, and process arbitrary data products
LDM’s Success • Unidata’s Internet Data Distribution system: • Near real-time data for 175 universities and research organizations • 30 data feeds (radar, satellite, text bulletins, lightning, model forecasts, surface obs, upper air obs, ...), • Also used by USGS, NASA, ESRL, weather services in Spain and Korea, active projects on 6 continents • Data volume: 2.5 GB/hr, 120000 products/hr; ranks fifth in weekly Internet2 traffic (Iperf, HTTP, NNTP, SSH, LDM, ... FTP)
More LDM Successes • NOAA/NWS adopted for Level II radar distribution • From 134 radars to 125 weather forecast offices, 22 universities, 10 federal organizations, 12 commercial organizations • Will be used in THORPEX Interactive Grand Global Ensemble (TIGGE) • Model output collection from 10 global modeling centers • Collected at 3 archive centers (NCAR, ECMWF, Beijing) • Test from ECMWF to NCAR sustained 17 GB/hr • Candidate to replace WMO’s Global Telecommunications System (GTS)
NetCDF’s Niche • Simple data model for scientific datasets • Portable, self-describing data • Supports direct access (unlike XML) • Many language interfaces: C, Fortran, C++, Java, Python, Perl, Ruby, ... • Lots of applications • Efficient subsetting of multidimensional arrays • Supports appending, sharing, archiving data
NetCDF-Java (nj22) • 100% Java library, more advanced than C-based interfaces • Prototype implementation of Common Data Model for access to netCDF-4, OPeNDAP, HDF5 • Provides netCDF interfaces to other formats: Grids (GRIB1, GRIB2), Radar (NEXRAD, NIDS, DORADE), Satellite (DMSP, GINI), Point Observations (BUFR (soon)) • Provides uniform coordinate systems layer • Access to THREDDS catalogs • Implements access through NcML
Application Application Applications Scientific Datatypes Point Trajectory Station Radial Grid Swath Coordinate Systems Common Data Access Model OPeNDAP HDF5 GRIB netCDF ... THREDDS Common Data Model
Success of • Basis for CF Conventions for climate and forecast data • Used at LLNL/PCMDI for archiving model output for the upcoming IPCC Fourth Assessment Report: 23 models, 30 TBytes, 70000 files • Used in various archives maintained by NOAA, NASA, USGS, DoE, NCAR, BADC, CSIRO, ... • C and Fortran netCDF Users Guides have been translated into Japanese at Kyoto University • Other uses in chromatography, mass spectrometry, neuro-imaging, biomolecule trajectory simulations, ... • Used in 15 commercial packages and over 50 open source packages for analysis, visualization, and data management
THREDDS • Originally funded under NSF Digital Libraries initiative • “Discovery and use of scientific data” • Middleware between data providers and users • Dataset Inventory Catalogs (XML) • Now part of Unidata Data Collections effort • Data Serving (pull) • THREDDS Data Server (TDS) most recent development • A THREDDS catalog provides a hierarchical structure for factoring inherited metadata
TDS (THREDDS Data Server) • Integrates data access with THREDDS catalogs and services • Tomcat/Servlet, 100% Java, single war file • Data input is netCDF Java 2.2 library • Data output: • OPeNDAP (for accessing subsets) • HTTP Server (for bulk file transfer) • OGC Web Coverage Server (currently gridded only, subsetting supported) • Supports dynamic generation of catalogs
Success of THREDDS • THREDDS used in NCAR Community Data Portal, many other data archives • TDS in use for serving IDD data from motherode.ucar.edu, other data providers • From “Lessons Learned: Evaluation Studies Related to Geoscience Data in THREDDS and DLESE”, Susan Lynds et al: • “Data providers agreed that THREDDS has made data access much easier than it used to be and enables them to reach new user communities.”
udunits • Library for manipulating units of physical qualities. • Conversion of unit specifications between formatted and binary forms • Arithmetic manipulation of unit specifications • Conversion of values between compatible scales of measurement • C, Fortran, and Java interfaces • Required by CF conventions
udunits Success • Almost as widely used as netCDF
The Unidata Development Process • Unidata’s software engineering process appears haphazard and chaotic. • No uniform software engineering process • No regular code reviews • Specifications for software often missing or vague • No enforcement of coding standards • No measurement of programmer productivity • No effort underway to improve software engineering methodology
What Accounts for Unidata’s Successes? • ... and can other organizations benefit from the answers? • Magic fairy dust? • Advanced processes? • Signing bonuses? • Working conditions? • Luck?
I’ll Offer Some Theories • The identified factors are subjective • Based on almost twenty years involvement in Unidata • Discussion question: are any of these easily transferrable? • Discussion question: would we have had even better software success with application of disciplined development methodologies?
Involve Developers in Software Support • Superior support for users of legacy applications: • GEMPAK • McIDAS • Support for software developed elsewhere: • OPeNDAP • VisAD • Every developer expected to answer user questions
GEMPAK • Application for analysis and visualization • In use at over 200 sites, use still growing • Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting data types
McIDAS • In use at approximately 100 sites, a growing number outside the U.S. • Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting new data types
Unidata User Support • Over 30 responses to user questions/day • Searchable support archives help • Support for legacy apps still significant • Balance between visualization apps, data middleware • Keeps developers close to users
Leverage User Efforts • NetCDF users have contributed language interfaces, applications, good ideas, and bug reports: www.unidata.ucar.edu/software/netcdf/credits.html Bob Albrecht, Ethan Alpert, Chris Anderson, Ayal Anis, Harald Anlauf, Phil Austin, Eric Bachalo, Jason Bacon, Sandy Ballard, Matthew Banta, Mike Berkley, Sherman Beus, Lorenzo Bigagli, Mark Borges, Nicola Botta, Dr. Kenneth P. Bowman, Bill Boyd, Mark Bradford, Bernward Bretthauer, Dr. Paul A. Bristow, Roy Britten, Glenn Carver, Tom Cavin, Morrell Chance, Susan C. Cherniss, Jason E. Christy, Gerardo Cisneros, Alain Coat, Carlie J. Coats, Jr., Jon Corbet, Alexandru Corlan, Jim Cowie, Arlindo da Silva, Rick Danielson, Alan Dawes, Donald W. Denbo, Charles R. Denham, Arnaud Desitter, Steve Diggs, Michael Dixon, Alastair Doherty, Bob Drach, Patrice Dumas, Frank Dzaak, Brian Eaton, Harry Edmon, Lee Elson, Ata Etemadi, Constantinos Evangelinos, John Evans, Joe Fahle, Gabor Fichtinger, Glenn Flierl, Connor J. Flynn, Anne Fouilloux, Jean-Francois Foccroulle, Mike Folk, David Forrest, David W. Forslund, Ben Foster, Masaki Fukuda, Dave Fulker, James Gallagher, Bear Giles, Tom Glaess, Peter Gleckler, André Gosselin, Gary Granger, Jonathan Gregory, Patrick Guio, Mark Hadfield, Magnus Hagdorn, Paul Hamer, Steve Hankin, Bill Hart, Kate Hedstrom, Charles Hemphill, Olaf Heudecker, Donn Hines, Konrad Hinsen, Leigh Holcombe, Tim Holt, Toshinobu Hondo, Takeshi Horinouchi, Chris Houck, Matt Huddleston, Matt Hughes, Doug Hunt, Alan Imerito, Jouk Jansen, Harry Jenter, Susan Jesuroga, Patrick Jöckel, Tomas Johannesson, Peter Gylling Jørgensen, Narita Kazumi, John Kemp, Jeff Kuehn, V. Lakshmanan, Bruce Langdon, Stephen Leak, Tom LeFebvre, Angel Li, Jianwei Li, Rick Light, Brian Lincoln, Keith Lindsay, Fei Liu, Jeffery W. Long, Dave Lucas, Valerio Luccio, Lifeng Luo, Steve Luzmoor, Lawrence Lyjak, Rich Lysakowski, Sergey Malyshev, Len Makin, Jim Mansbridge, Andreas Manschke, Chris Marquardt, Marinna Martini, William C. Mattison, Craig Mattocks, Mike McCarrick, Bill McKie, Ron Melton, Roy Mendelssohn, Pavel Michna, Barb Mihalas, Henry LeRoy Miller Jr., Philip Miller, Rakesh Mithal, Masahiro Miiyaki, Christine C. Molling, Skip Montanaro, Thomas L. Moore, Stefano Nativi, Gottfried Necker, Peter Neelin, Michael Nolta, Bill Noon, Enda O'Brien, Dave Osburn, Dan Packman, Simon Paech, Gabor Papp, Morten Pedersen, Dr. Louise Perkins, Michael D Perryman, Hartmut Peters, Ron Pfaff, David Pierce, Alexander Pletzer, Philippe Poilbarbe, Dierk Polzin, Jacob Weismann Poulsen, Ken Prada, Dave Raymond, Michael Redetzky, Rene Redler, Mark Reyes, Doug Reynolds, Mike Rilee, Mark Rivers, Randolph Roesler, Mike Romberg, Mathis Rosenhauer, Suzanne T. Rupert, Toshihiro Sakakima, Eric Salathe, Matthew H. Savoie, Marie Schall, Larry A. Schoof, Dan Schmitt, Robert B. Schmunk, Rich Schramm, William J. Schroeder, Uwe Schulzweida, Keith Searight, Guntram Seiss, Remko Scharroo, John Sheldon, Masato Shiotani, Michael Shopsin, Richard P. Signell, Steve Simpson, Joe Sirott, Greg Sjaardema, Dirk Slawinski, Cathy Smith, Neil R. Smith, Peter Paul Smolka, Nancy Soreide, Hudson Souza, Gunter Spranz, Richard Stallman, Bob Swanson, John Tanski, Karl Taylor, Jason Thaxter, Kevin W. Thomas, Philippe Tulkens, Tom Umeda, Joe VanAndel, Paul van Delst, Gerald van der Grijn, Richard van Hees, János Végh, Bernhard Wagner, Thomas Wainwright, Stephen Walker, Chris Webster, Paul Wessel, Carsten Wieczorrek, Gerry Wiener, Ralf Wildenhues, David Wilensky, Hartmut Wilhelms, Gareth Williams, David Wojtowicz, Jeff Wong, Randy Zagar, Charlie Zender, Remik Ziemlinski.
Strive for Discipline-Independence • Demand is greater than supply for useful data-oriented infrastructure for science • Examples: • netCDF • LDM • THREDDS • udunits • Common Data Model • ...
Emphasize Loose Coupling • Data providers and data consumers should be uncoupled • Data storage should be uncoupled from visualization and analysis applications • Data distribution should be independent of type of data • ...
Find Right Level for Abstractions Data Scientific Data Georeferenced Data Meteorological Data Radar Data
Improve Software Quality by Porting • Platform-independence is important • Achieving it seems to improve quality of software in unexpected ways • Aiming for reasonable tradeoffs between portability and performance requires expertise • Solving portability problems for others (e.g. providing portable data, service-oriented architectures) is a growth industry • Java developers may ignore this
Work on Small Projects • Unidata projects and software packages typically require only one or two developers • Much of software engineering is about scaling to large projects with dozens of developers • May be the #1 secret for success
Find and Exploit Tight Feedback Loops • Develop for an active and interested user community • Find specific users with problems important to them that your software can solve • Exploit short iterations for incremental development • Governance: establish and pay attention to an external Users Committee that meets regularly
Use the Software You Develop • “Eat your own dogfood” • The Unidata Integrated Data Viewer uses netCDF Java, THREDDS, NcML, netCDF decoders, VisAD, OPeNDAP, ADDE servers • Provides end-to-end testing • Prioritizes useful enhancements • Leads to early bug identification by developers instead of users • If taken too far, leads to NIH syndrome
Drive Development with Tests • Test-driven development (TDD) and Unit Testing gives developers confidence to • refactor code • try big changes • port to new platforms • Example: netCDF “make check” runs over 150,000 tests
Value People over Process • Important tenet of the “Manifesto for Agile Software Development”, http://agilemanifesto.org/, to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan
Arrange Long Funding Cycles T. T. T. Put up in a place where it's easy to see the cryptic admonishment T. T. T. When you feel how depressingly slowly you climb, it's well to remember that Things Take Time. --Piet Hein
Summary: The “Secrets” • Involve developers in support • Leverage users efforts • Strive for discipline-independent infrastructure • Emphasize loose coupling • Choose the right level for abstractions • Improve quality by porting
More “Secrets” • Work on small projects • Find good feedback loops • Use your own software • Drive development with tests • Value people over process • Arrange for long funding cycles