1 / 80

GINS The GARR Network Monitoring System

GINS The GARR Network Monitoring System. Agenda. PART 1 GINS description NOC Tools Motivation Required Functionality Monitoring Environment Statistics Examples Visualization Reports Slicing Traffic Flows Analysis Work in progress. PART 2 Let’s code the Network Monitoring!

suzy
Download Presentation

GINS The GARR Network Monitoring System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GINSThe GARR Network Monitoring System Giovanni Cesaroni, GARR EUMEDCONNECT2 Training – Rome, 22-25 June 2009

  2. Agenda PART 1 GINS description • NOC Tools Motivation • Required Functionality • Monitoring Environment • Statistics Examples • Visualization • Reports • Slicing • Traffic Flows Analysis • Work in progress PART 2 Let’s code the Network Monitoring! • SNMP in action • BGP, OSPF, MPLS, IPv6 PART 3 RRD World • RRD in action • How to avoid loosing data Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  3. GARR Network • 43 POPs (University and Research Centre) • PEERING: 76 Gbps • 52.5Gbpsvs GEANT2 • 10G + 2.5G IP Access • 3*10GE E2E links • 9*1GE E2E links • 3x2.5GbpsIP Transit • 2 Milan + 1 Rome • 7x1Gbps+10GbpsNational PEERING • BackBone Capacity ~110Gbps • 7 TLC Operators • Telecom Italia • Infracom (ex Autostrade TLC) • Fastweb • Interoute (ex Eurostrada) • WIND • BT-Italia (ex Albacom) • COLT-Telecom • 3 International IP Carrier • Global Crossing • Telia • Level3 • Access Capacity: ~60Gbps • Starting from 2M 10G • N.Access Links: 500 • N.Backbone Links: 62 • E2E Capacity: ~40Gbps • from 1G 10G Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  4. GOALS • Provide the NOC, Operations and Planning staff with all the tools needed to do their work as well as possible • Monitor users site connectivity • Check the status of the services at each level of the network • service oriented approach (not metric oriented) • Integrate monitoring services • Automate tools configuration • Give easy access to the information • Automatic generation of fault and performance reports The goal is not to manage the control plane, but to have full control of the network Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  5. GARR Network GARR NOC Measurements Storage(MySQL & RRD) Consistency Tools Robots GINS Architecture GINS Monitoring Tools GINS Visualization Tools GARR-DB: Network Database(Network Structure MySql) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  6. GARR-DB: the Information System Aggregate Logical “circuit” (IP link,MPLS LSP, lambda service, etc) physical object User Site segments physical circuit physical circuit GARR Backbone eq physical objects GARR Domain administrative and technical information!!! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  7. SW tools used by GINS Scheduler: Cron Reports: PHP, Jpgraph, HTMLDOC Data visualization: PHP, HTML, Javascript, Ajax, SVG Data storage: MySQL, File, RRD Data management: AWK, Bash, PHP, RRDtools ~5500RRD files Data acquisition: MRTG, SNMP polls, ping Network Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  8. NOC in action Alarms APM Trouble Ticket TLC NOC GARR NOC GARR Backbone End Site Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  9. GINS at a glance Main functionalities • Network monitoring • Statistics acquisition • Trouble Ticket System • Fault and Performance Reports • Monitoring Services • Lambda • SDH/SONET • MPLS • IPv4, IPv6 • OSPF, BGP • E2E • Multicast Beacons • Equipment • Statistics Services • IPv4, IPv6, Multicast traffic • Physical interface errors • Routers CPU • Premium IP • SDH/SONET errors • Backbone weathermap • Uncompressed Statistics Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  10. Monitoring services • GINS detects/defines the status of different services, on the basis of the information gathered through the network. Monitoring is supported on the following service classes: • IPv4 and IPv6: [service status, input errors and output drops on physical interfaces] • end-user site • backbone interface • IP Multicast Beacons[service status] • Routing protocols: • OSPF [link costs] • BGP [peering status, adv/rec routes] • SDH/Sonet[SDH/Sonet errors] • router interface on leased-lines • Lambda[service status, optical equipment port status] • MPLS[MPLS LSP status] • E2E: [E2E service status] • defined as the stitching of multiple intra-domain and inter-domain links Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  11. Statistics services • GINS stores performance measurements data and provides: • Traffic Statistics • IPv4 and IPv6, Multicast for end user sites and backbone • Aggregate • Peering • Premium IP • Uncompressed Statistics • Sonet/SDHerrors on leased lines • Router CPUload and temperature Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  12. Other services • GINS includes a Trouble Ticket System which is highly customized for the GARR operations procedures. In particular, it manages user services, leased lines and PoP ticket. • Fault and performance reports: • User monthly and yearly reports (HTML and PDF) • User fault report and circuit availability • Uncompressed traffic statistics (IP BW usage, 95th percentile, etc.) • Carrier fault report and circuit availability (HTML and PDF) • Monitored physical devices: • Juniper J6350, M7i, M10, M20, M320 • Cisco: 12xxx, 17xx, 18xx, 2xxx, 3750, 72xx, 75xx • ADVA FSP3000 • Metrobility R4000, R5000 Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  13. Who is the target user of monitoring UIs? The NOC & the Operation Staff, private access Monitoring Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  14. Control Panel and IP Monitoring BGP Alarms & Monitoring • E2E Monitoring, Lambda & MPLS • Other Services Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  15. Monitor Control Panel Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  16. NOC Interface (1/2) : links status Last action Trouble ticket Telnet Traffic in/out End Site Info Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  17. NOC Interface (2/2): other services and quick ticket management Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  18. End Site Info Trouble Tickets Traffic Interface Errors Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  19. Physical Interface Input Errors and Output Drops 2Mbps The link is going to be upgraded to a Gbps link in the next days! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  20. E2E Monitoring Status of the “domain segment” Status of the Interdomain Link Aggregate status of the “domain link” Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  21. E2E Stitching Monitoring IP MPLS LSP 10GE Lambda Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  22. GINS vs Gn2 E2E CU E2Emon Switch & DFN GARR NOC GN2 E2E CU GINS data aggregation E2Emon XML schema GARR archive GN2:JRA4 Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  23. MPLS Monitoring • MUPBED: one e2e connection Informations on: 1- LSP1 2- L2 connection GINS MPLS Service TLAB GN2 GN2IT TO GN2DFN LSP2 GARR SNMP Polls LSP1 LSP3 MI1 DFN FF MI2 TSystem Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  24. MPLS Monitoring: MUPBED case LSP Status E2E L2 inter-domain status Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  25. Peer status & prefixes information Alarms BGP monitoring ... ... Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  26. SONET Alarms (rfc2558) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  27. Statistics • Common statistics sets, different type of representation • Online Network Status • Other Services Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  28. Long Term Analysis Traffic, Input errors & output drops CPU load & temperature Router aggregate traffic & peaks Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  29. Example of temperature statistics In such cases I’d like to be alerted by email, SMS, phone and voice!!! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  30. The backbone weathermap Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  31. Ticket info 25 20 615M OSPF cost Router CPU temperature Traffic load Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  32. Ticket info Traffic load Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  33. How it works Weathermap Merge HTML dynamic map SVG image Generate Convert PNG image Network Measurements Storage Network Database Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  34. 1- Network users, end sites • fault and availability reports of the services • historical traffic data • Who is the target user for network reports? • What kind of reports are provided? Fault & Performance Reports Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  35. Fault & Performance Reports: UI monthly report 95th percentile Uncompressed statistics GARR User Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  36. User monthly and yearly PDF Reports Introduction Faults and availability Monthly and yearly traffic statistics ~1,000 report pages per month ~50MB disk space per month Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  37. Uncompressed Traffic Statistics, monthly view 95th percentile 5 minutes Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  38. Uncompressed Traffic Statistics, yearly view Monthly values Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  39. Historical data 2005!! Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  40. 2- Network planning staff • to extrapolate the traffic trends for the future network planning Fault & Performance Reports • Who is the target user for network reports? • What kind of reports are provided? • 1- Network users, end sites • fault and availability reports of the services • historical traffic data Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  41. GARR Traffic Trends 30.67 Gbps 3.84 Gbps Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  42. Traffic Evolution GLOBAL INTERNET r ~ 1.4/y NATIONAL INTERNET r ~1.6/y E2E RESEARCH TRAFFIC r ~2.0/y Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  43. Latency Measurements http://oss.oetiker.ch/smokeping/ By Tobias Oetiker Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  44. Latency Measurements • Round Trip Time fluctuations • Packet Loss pecentage Fping probe End Site Server Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  45. Slices GARR-DB: Network Database Description of the infrastructure • Temporary infrastructures Homer’s dream is just: • Network Labs • Temporary research projects • Infrastructures requiring monitoring only • Dedicated monitoring systems (users or projects) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  46. Slices Dedicated monitoring systems • Administrator requirements: • Easy to manage • Replicable • User requirements: • Quick and easy setup • Traffic statistics • Weathermaps • Alarms Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  47. Slices • Slice link, description and status • MRTG log status • Access policy • Url • Slice status (on,off) • Status of MRTG CFG generation (red if disabled) • Cronjob status (red if disabled) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  48. Slices Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  49. Traffic Flows Analysis Suite Nfsen/Nfdump by Peter Haag Based on NetFlow protocol Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

  50. Traffic Flows Analysis, architecture overview www Nfsen Nfdump RRDs User Nfdump (CLI) NetFlow, data export, sampling Nfcapd Network Raw data • Daily numbers: • ~2000 flows/s export • sampling 1:1000 • ~40MB-1.6GB each router (raw data) Giovanni Cesaroni, EUMEDCONNECT2 Training, Rome 22-25 June 2009

More Related