1 / 10

Testimony to the Advisory Committee on CyberInfrastructure v2.0

Testimony to the Advisory Committee on CyberInfrastructure v2.0 . Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise). NSF Post-Testimony Reprise 2/15/02 15:00 cgb. Teleconference of voice bridge & access grid both very poor!

montana
Download Presentation

Testimony to the Advisory Committee on CyberInfrastructure v2.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testimony to the Advisory Committee on CyberInfrastructure v2.0 Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise)

  2. NSF Post-Testimony Reprise 2/15/02 15:00 cgb • Teleconference of voice bridge & access grid both very poor! • Significant need for an access grid & distributed teleconferencing appliance for … (a trip to testify for 30 min. or hold the meeting should be virtual!) • Same old concerns: “I don’t have as many flops as their lab.” Much of the facilities should be distributed and with build-it yourself Beowulf clusters to get extraordinary cycles and bytes. • Centers need to be re-centered see Bell & Gray, “What’s Next in High Performance Computing, Comm. ACM, Feb. 2002, pp91-95. • Cost of networking, cycles, and bytes requires rethinking how to do various kinds of science, in light of the centrality versus distributed nature of the work e.g. instrumentation that generates lots of data. (Last mile problem is significant.) • Fedex’d hard drive is cheap. Cost of hard drive < network cost. Net is very expensive! • Centers flops and bytes are expensive. Distributed likely to be less so. • Many sciences need to be reformulated as a distribute computing/dbase • Network costs are a disgrace. $1 billion boondoggle with NGI, Internet II. • Have put too much money in tool builder hands. They are reinventing industry tools, but not with the cognizance of WWW technology! • Give funding to scientists in joint grants with tool builders e.g. www came from user • Database technology is not understood by users and computer scientists • Training, tool funding, & combined efforts especially when large & distributed • Equipment, problems, etc are dramatically outstripping our capabilities! • Time for an NSF reboot!

  3. Network concerns • Very high cost • $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper • DSL at home is $0.15 - $0.30 • Disks cost less than $2/GByte to purchase • Low availability of fast links (last mile problem) • Labs & universities have DS3 links at most, and they are very expensive • Traffic: Instant messaging, music stealing • Performance at desktop is poor • 1- 10 Mbps; very poor communication links • Manage: trade-in fast links for cheap links!!

  4. You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. Oh!, and 1PB ~10,000 disks At some point you need indices to limit searchparallel data search and analysis This is where databases can help Goal Make it easy to Publish: Record structured data Find:Find data anywhere in the network Get the subset you need Explore datasets interactively You can FTP 1 MB in 1 sec You can FTP 1 GB / min (= 1 $/GB) … 2 days and 1K$ … 3 years and 1M$ Some science is hitting a wallFTP and GREP are not adequate (Jim Gray)

  5. Collaborative research sharing instrumentation, data, and programs • We’ve talked about it for decades e.g. accelerators to telelescopes and zoology • Doer / “User, talker & meeter” = 4%. • http://www.all-species.org/ has the problem… • Focus has been and is on ops not bytes! • E.g. Pittsburgh center funded with no storage • Why have centers for computation at all? Don’t we need datacenter? • By having no storage, re-compute everything • Adding indexes i.e. databases, increases speed, lessens computation, and increases experimentation • Computation centers become data centers since everyone/anyone builds a center • Need for computational scientist database talent!

  6. Recommendations (given as testimony) • The “system” is fundamentally broken that is going to move to a different level. • Give the funding to users, not VLGs to tool builders to reinvent: • HTTP and XML etc. (starting over) using FTP • Naming and discovery services • Security • Scheduling and accounting, etc. • Goal has to be publishing programs & data has to be as easy as publishing web pages!

  7. An Example: SkyServer and SkySurvey Database; A Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.doc

  8. What can be learned from Sky Server? • It’s about data, not about harvesting flops • 1-2 hr. query programs versus 1 wk programs • 10 minute runs versus 3 day compute & searches • Database viewpoint. • Avoid costly re-computation and searches • Use indices and PARALLEL I/O. Read / Write >>1.(parallelism is automatic and transparent) • The talent appears to be non-existent to do this.

  9. Heuristics for building communities that need to share data & programs • Always go from working to working • Do it by induction in time and space(Why version 3 is pretty good.) • Put ONE database in place that’s useful by itself in terms of UI, content, & queries • Invent and demo 10-20 instances of use • Get two working in a single location • Extend to include a second community, with an appropriate superset capability

  10. Gigabit per second workstation to workstation bet. 7 March 1997Bet against optimists & big programs! Raj Reddy, J. Gray, & Dan Ling versus A. van Dam, J. Hennessy, Ed Lazowka and G Bell. Decide 12/31/2000. (Dinner and wine... wine cost not to exceed cost of dinner.) RR, JG, and DL bet that at least 10K Workstations, located in at least 10 sites, in at least 3 states will be able to communicate with one another over an end to end path operating at least at a 1 Gigabit per second rate. (Workstation to backbone, backbone to WAN, and WAN must all operate at this rate.) The phone bill has been several hundred million. It remains undone (2002), independent of computers size.

More Related