1 / 23

PIPE Dreams

PIPE Dreams. Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003. Abstract.

ryanadan
Download Presentation

PIPE Dreams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003

  2. Abstract The vision of science grids allocating resources to analyze huge quantities of HENP data clearly depends on reliable network performance. Tools developed at SLAC in conjunction with the Internet2 PIPES project will help to ensure this. In this talk, these tools will be discussed and the procedure for publishing performance data, in particular using the Globus toolkit's MDS and web services will be reviewed. The subsequent analysis and trouble-shooting methodology will be discussed with real world examples from the particle physics data grid (PPDG) and the European data grid (EDG).

  3. Overview • What is the problem ? • What is PIPES ? • Network performance monitoring • Problem identification

  4. Network Monitoring for the Grid • The Data Grid consists of many components that must interoperate Farm requestor Data Farm Data The Network Data Farm requestor Resource Broker

  5. Allocate Resources • The resource broker must be fully informed • Measurement is required ! Farm requestor Data 12% pkt loss Farm Data The Network 80% Utilization OC48 Data Farm requestor Resource Broker

  6. What is PIPES ? • Internet2 • End-to-end performance initiative • PI Performance Evaluation System (PIPES) • PIPES Monitoring Platform (PMP) • Overlap with goals of HENP • Tremendous resources

  7. IEPM-BW • Package developed at SLAC • Measurement Engine • Iperf, bbftp, bbcp, ping, traceroute • Abwe, owamp, udpmon, gridftp • Job Manager • Data Storage and data server • Analysis Engine

  8. LANL EDG KEK CERN TRIUMF NIKHEF FNAL NERSC IN2P3 ANL PPDG/GriPhyN CHI CERN SNV ESnet ORNL RAL JLAB NY UCL ORNL SLAC UManc SLAC Imperial JAnet DL NNW BNL APAN Stanford RIKEN Stanford INFN-Roma APAN INFN-Padua Geant CalREN INFN-Milan Abilene SEA CESnet NY NASA WASH SNV Monitoring Site SOX HSTN DNVR ATL CLV IPLS UTAH SDSC UFL CALTECH I2 UTDallas UMich Rice NCSA

  9. NNW BaBar Grid Manchester 10 Gbps TVN 622Mbps RAL Janet ESnet SWERN SLAC Bristol Geant Stanford DFN Dresden Calren Abilene 1 Gbps 2.5 Gbps Renater IN2P3

  10. Problem Identification • Typical Scenario • User complains file transfer is slow • Net admin runs ping, traceroute, iperf test • Complain to upstream provider • Proactive • What do we mean by throughput? • How do we know there was a performance hit? • Our approach is diurnal changes

  11. Alarms • Too much to keep track of • Rather not wait for complaints • Automated Alarms • Rolling average à la RIPE-TT • May not be the best approach • AMP Automated Detection System

  12. Limitations • Could be over an hour before alarm is generated • More frequent measurements impact the network and measurements overlap • Low impact tools allow finer grained measurement • Use NWS multi-variate method • Use SCIDAC ABwE tool • Use PingER, OWAMP

  13. Publishing • Many monitoring projects, publish data to allow them to inter-operate • MDS • EDG NM Schema • Web Services • GLUE NE Schema • GGF NMWG • Hierarchy Doc • Tools Doc ./get_data 2003 3 18 6 1 41 1.61 1.601 1.62 0

  14. Net Rat • Alarm System • Multiple tools • Multiple measurement points • Trigger further measurements • Cross reference off site stats • Informant database • No measurement is ‘authoritative’ • Cannot even believe a measurement

  15. Log 03/20/2003 20:13:46 ALARM pcgiga throughput=305.224 ctresh=512.95 athresh=312.91 03/20/2003 20:13:48 TRACE no change in route detected 03/20/2003 20:16:07 CALM Throughput within acceptable limits. ALARM CANCELLED

  16. Toward a Monitoring Infrastructure • MAGGIE • Measurement and Analysis package built on NIMI/Akenti • EDEE • production-quality Data Grid for Europe

  17. More Information • IEPM Home Page • IEPM-BW • I2 E2E and PIPES • RIPE-TT • AMP Automated Event Detection • NWS • ABWE

  18. End This talk made possible by the IEPM team at SLAC (Les Cottrell, Connie Logg, Jiri Navratil, Jerrod Williams, Fabrizio Coccetti), and the many developers and maintainers around the world.

More Related