510 likes | 1.04k Views
Brad Nicholes Sr. Software Engineer/Consultant, Novell Member Apache Software Foundation bnicholes@apache.org. Monitoring Your Data Center Using Apache and Ganglia. Agenda Ganglia Monitoring. Introduction and Overview Ganglia Architecture Apache Web Front End Gmond & Gmetad
E N D
Brad Nicholes Sr. Software Engineer/Consultant, Novell Member Apache Software Foundation bnicholes@apache.org Monitoring Your Data CenterUsing Apache and Ganglia
Agenda Ganglia Monitoring • Introduction and Overview • Ganglia Architecture • Apache Web Front End • Gmond & Gmetad • Extending Ganglia • GMetrics • Gmond Module Development • What’s New and What’s Coming
Introduction and Overview • Scalable Distributed Monitoring System • Targeted at monitoring clusters and grids • Multicast-based listen/announce protocol • Depends on open standards • XML • XDR compact portable data transport • RRDTool – Round Robin Database • APR – Apache Portable Runtime • Apache HTTPD Server • PHP based web interface • Ganglia version 3.1.0 release July 2008 • http://ganglia.sourceforge.net or http://www.ganglia.info
Ganglia Architecture • Gmond – Metric gathering agent installed on individual servers • Gmetad – Metric aggregation agent installed on one or more specific task oriented servers • Apache Web Front End – Metric presentation and analysis server • Characteristics • Multicast – All gmond nodes are capable of listening to and reporting on the status of the entire cluster • Failover – Gmetad has the ability to switch which cluster node it polls for metric data • Lightweight and low overhead metric gathering and transport • Ported to multiple platforms (Linux, FreeBSD, Solaris, others)
Ganglia Web Front End • Built around Apache HTTPD server using mod_php • Uses presentation templates so that the web site “look and feel” can be easily customized • Presents an overview of all nodes within a grid vs all nodes in a cluster • Ability to drill down into individual nodes • Presents both textual and graphical views
Deploying Ganglia Monitoring • See http://ganglia.wiki.sourceforge.net/ganglia_gmond_configuration • Install Gmond on all monitored nodes • Edit the configuration file • Add cluster and host information • Configure network upd_send_channel, udp_recv_channel, tcp_accept_channel • Start gmond • Installing Gmetad on an aggregation node • Edit the configuration file • Add data and failover sources • Add grid name • Start gmetad • Installing the web front end • Install Apache httpd server with mod_php • Copy Ganglia web pages and PHP code to appropriate location • Add appropriate authentication configuration for access control
Gmond – Metric Gathering Agent • Standard Metric Modules • CPU, Network I/O, Disk I/O, Memory and System • Extensible • Loadable modules capable of gathering multiple metrics or using advanced metric gathering APIs • Gmetric – Out-of-process utility capable of invoking command line based metric gathering scripts • Built on the Apache Portable Runtime • Supports Linux, FreeBSD, Solaris and more…
Gmond – Metric Gathering Agent • Automatic discovery of nodes • Adding a node does not require configuration file changes • Each node is configured independently • Each node has the ability to listen to and/or talk on the multicast channel • Can be configured for unicast connections if desired • Heartbeat metric determines the up/down status • Interfaces • Collection functions – Capable of running specialized functions for gathering metric data • Multicast I/O – Listen/Send metric data from/to other nodes in the same cluster • Data export listeners – Listen for client requests for cluster metric data
Gmond – Global Configuration • Daemonize - When “yes”, gmond will daemonize • Setuid - When “yes”, gmond will set its effective UID to the uid of the user specified by the user attribute • Debug_level - When set to zero (0), gmond will run normally. Greater than zero, gmond runs in the foreground and outputs debugging information • Mute - When “yes”, gmond will not send data • Deaf - When “yes”, gmond will not receive data • Host_dmax - When set to zero (0), gmond will not delete a host from its list. If set to a positive number, gmond will flush a host after it has not heard from it for N seconds • Cleanup_threshold - Minimum amount of time before gmond will cleanup expired data • Send_metadata_interval - Establishes an interval in which gmond will send or resend the metadata packets that describe each enabled metric. • Module_dir - Optional parameter indicating the default directory where the DSO modules are located.
Gmond – Cluster Configuration • Name - Specifies the name of the cluster of machines • Owner - Specifies the administrators of the cluster • Latlong - Latitude and longitude GPS coordinates of this cluster on earth • Url - Additional information about the cluster
Gmond – Network Configuration • Udp_send_channel • mcast_join, mcast_if – Multicast address and interface • Host – Unicast host • Port – Multicast or Unicast port • Udp_recv_channel • mcast_join, mcast_if, Port – Multicast address, interface and port • Bind – Bind a particular local address • Family – Protocol family • Tcp_accept_channel • Bind, Port, Interface – Bind a particular local address, listen port and interface • Family – Protocol family • Timeout – Request timeout
Gmond – Configuration Example globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no host_dmax = 0 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 0 module_dir = /usr/lib/ganglia } cluster { name = “My Cluster" owner = “Administrator" latlong = “N37.37 W122.23" url = “http://www.moreinfo.org" } udp_send_channel { mcast_join = 239.2.11.71 port = 8649 ttl = 1 } udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71 } tcp_accept_channel { port = 8649 }
Gmond – Access Control acl { default = "deny" access { ip = 192.168.0.4 mask = 32 action = "allow" } } • Configured in upd_recv_channel or tcp_accept_channel sections • Examples: • “Deny all” with exceptions -> • “Allow all” with IPv4 & IPv6 exceptions -> acl { default = "allow" access { ip = 192.168.0.0 mask = 24 action = "deny" } access { ip = ::ff:1.2.3.0 mask = 120 action = "deny" } }
Gmond – Metric Collection Groups • Specify as many collection groups as you like • Each collection group must contain at least one metric section • List available metrics by invoking “gmond -m” • Collection_group section: • Collect_once – Specifies that the group of static metrics • Collect_every – Collection interval (only valid for non-static) • Time_threshold – Max data send interval • Metric section: • Name – Metric name (see “gmond –m”) • Value_threshold – Metric variance threshold (send if exceeded) • Title – Optional user friendly title displayed in the web interface
Gmond – Configuration Example collection_group { collect_every = 20 time_threshold = 90 metric { name = "load_one" value_threshold = "1.0" title = “One Minute Load Average” } metric { name = "load_five" value_threshold = "1.0" title = “Five Minute Load Average” } … } collection_group { collect_every = 80 time_threshold = 950 metric { name = "proc_run" value_threshold = "1.0" title = “Running Processes” } metric { name = "proc_total" value_threshold = "1.0" title = “Total Processes” } } collection_group { collect_once = yes time_threshold = 20 metric { name = "heartbeat" } } collection_group { collect_once = yes time_threshold = 1200 metric { name = "cpu_num" title = “CPU Count” } metric { name = "cpu_speed" title = “CPU Speed” } metric { name = "mem_total" title = “Memory Total” } metric { name = "swap_total" title = “Swap Total” } … }
Gmetad – Metric Aggregation Agent • Polls a designated cluster node for the status of the entire cluster • Data collection thread per cluster • Ability to poll gmond or another gmetad for metric data • Failover capability • RRDTool – Storage and trend graphing tool • Defines fixed size databases that hold data of various granularity • Capable of rendering trending graphs from the smallest granularity to the largest (eg. Last hour vs last year) • Never grows larger than the predetermined fixed size • Database granularity is configurable through gmetad.conf
Gmetad - Configuration • Data source and and failover designations • data_source "my cluster" [polling interval] address1:port addreses2:port ... • RRD database storage definition • RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374" • Access control • Trusted_hosts address1 address2 … DN1 DN2 … • All_trustedOFF/on • RRD files location • rrd_rootdir "/var/lib/ganglia/rrds" • Network • xml_port 8651 • Interactive_port 8652
Gmetad – Configuration Example data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655 data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651 data_source "another source" 1.3.4.7:8655 1.3.4.8 trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org xml_port 8651 interactive_port 8652 rrd_rootdir "/var/lib/ganglia/rrds"
Round-Robin Database (RRD) • High performance data logging and graphing system for time series data • Automatic data consolidation over time • Define various Round-Robin Archives (RRA) which hold data points at decreasing levels of granularity • Multiple data points from a more granular RRA are automatically consolidated and added to a courser RRA • Constant and predictable data storage size • Old data is eliminated as new data is added to the RRD file • Amount of storage required is defined at the time the RRD file is created • RRDTool Web site: http://oss.oetiker.ch/rrdtool/
Ganglia Default RRD Definition • Definition of the Round-Robin Database format is determined at database creation time • Default Ganglia RRA definitions: • RRA #1 – 15 second average for 61 minutes • RRA #2 – 6 minute average for 24.4 hours • RRA #3 – 42 minute average for 7.1 days • RRA #4 – 2.8 hour average for 28.5 days • RRA #5 – 24 hour average for 374 days • Default largest retrievable time series, ~1 year • Configurable to whatever you want
Retrieving Data, Generating Graphs and Interacting with RRD Files • RRDFetch – Retrieve time series data from an RRD file for a specific time period • RRDInfo – Print header data from an RRD file in a parsing friendly format • RRDGraph – Creates a graphical representation of the specified time series data • RRDUpdate – Feed new data values into an RRD file • Other APIs – RRDCreate, RRDDump, RRDFirst, RRDLast, RRDLastupdate, RRDResize, …
Gmetric Service Level Metrics Utility • Extends the available metrics that can be produced through Gmond • Ability to run specialized metric gathering scripts • Pushes metric data back through Gmond • Must be scheduled through cron rather than Gmond • Gmetric repository on Ganglia project site • http://ganglia.sourceforge.net/gmetric/
Gmetric Command Line gmetric --conf=./custom.conf -n "wow" -v "it works" -t "string" Usage: gmetric [OPTIONS]... -h, --help Print help and exit -V, --version Print version and exit -c, --conf=STRING The configuration file to use for finding send channels (default=`/etc/gmond.conf') -n, --name=STRING Name of the metric -v, --value=STRING Value of the metric -t, --type=STRING Either string|int8|uint8|int16|uint16|int32|uint32|float|double -u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius (default=`') -s, --slope=STRING Either zero|positive|negative|both (default=`both') -x, --tmax=INT The maximum time in seconds between gmetric calls (default=`60') -d, --dmax=INT The lifetime in seconds of this metric (default=`0')
Gmond Pluggable Metric Modules • Extends the available metrics that can be gathered by Gmond • Implemented as dynamically loadable modules • Configured through gmond.conf • Scheduled through Gmond rather than an external scheduler • Module structure is similar to an Apache module • Able to produce multiple metrics from a single module
Gmond Module Development • Three callback interfaces • Init int (*ex_metric_init)(apr_pool_t *p); • Clean up void (*ex_metric_cleanup)(void); • Handler g_val_t (*ex_metric_handler)(int metric_index); • Metric definition structure mmodule example_module = { STD_MMODULE_STUFF, // Internal module definition ex_metric_init, // Metric init callback function ex_metric_cleanup, // Metric cleanup callback function ex_metric_info, // Metric info data structure ex_metric_handler, // Metric handler };
Gmond Example Module static const Ganglia_25metric ex_metric_info[] = { {0, "Random_Numbers", 90, GANGLIA_VALUE_UNSIGNED_INT, "s", both", "%u", UDP_HEADER_SIZE+8, "Example module metric (random numbers)"}, {0, "Constant_Number", 90, GANGLIA_VALUE_UNSIGNED_INT, "Num", "zero", "%u", UDP_HEADER_SIZE+8, "Example module metric(constant number)"}, {0, NULL} }; mmodule example_module = { STD_MMODULE_STUFF, ex_metric_init, ex_metric_cleanup, ex_metric_info, ex_metric_handler, }; mmodule example_module; static int ex_metric_init(apr_pool_t *p) { apr_array_header_t *list_params = example_module.module_params_list srand(time(NULL)%99); return 0; } static void ex_metric_cleanup ( void ) { } static g_val_t ex_metric_handler ( int metric_index ) { g_val_t val; switch (metric_index) { case 0: val.uint32 = rand()%99; return val; case 1: val.uint32 = 50; return val; } /* default case */ val.uint32 = 0; return val; }
Gmond Example Module Configuration modules { module { name = “example_module” path = “/usr/lib/ganglia/modexample.so” Param RandomMax { Value = 75 } Param ConstantValue { Value = 25 } } } /* Define Collection Groups */ collection_group { collect_every = 10 time_threshold = 50 metric { name = “Random_Numbers” title = “Random Number Metric” value_threshold = 30.0 } } collection_group { collect_once = yes time_threshold = 20 metric { name = “Constant_Number” title = “Constant Number Metric” } }
Gmond Python Module Development • Extends the available metrics that can be gathered by Gmond • Configured through the Gmond configuration file • Python module interface is similar to the C module interface • Ability to save state within the script vs. a persistent data store • Larger footprint but easier to implement new metrics
Gmond Python Module Development • Three mandatory functions • metric_init(params) • Called once at module initialization time • Must return a metric description dictionary or list of dictionaries • Any other module initialization can also take place here • metric_handler(name) – may have multiple handlers • Metric gathering handler • Must return a single data value of the same type as specified in the metric description dictionary returned by metric_init() function • metric_cleanup() • Called once at module termination time • Does not return a value
Gmond Python Module Development • Metric definition data dictionary d = {‘name’: ‘<your_metric_name>’, ‘call_back’: <call_back function>, ‘time_max’: int(<your_time_max>), ‘value_type’: ‘<string | uint | float | double>’, ‘units’: ’<your_units>’, ‘slope’: ‘<zero | positive | negative | both>’, ‘format’: ‘<your_format>’, ‘description’: ‘<your_description>’, ‘groups’: ‘<group names>’} • Can be a single dictionary or a list of dictionaries • Must be returned from the metric_init() function
Gmond Python Module Development Curve_Max = 15 v = int(1) inc = int(1) count = 0 def metric_init(params): global Curve_Max if ‘CurveMax’ in params: Curve_Max = int(params[‘CurveMax’]) d = {‘name’: ‘Curve_Metric’, ‘call_back’: curve_handler, ‘time_max': int(60), ‘value_type’: ‘uint’, ‘units’: ‘Seconds’, ‘slope’: ‘both’, ‘format’: ‘%u’, ‘description’: ‘Shows a uniform curve’, ‘groups’: ‘Examples’} return d def curve_handler(name): global v,count,inc,Curve_Max v += inc count += 1 if count > Curve_Max: count = 0 inc = -inc return int(v) def metric_cleanup(): pass
Gmond Python Module Deployment • Copy the .py file to the specified directory • The python modules directory is defined in the gmond.conf file • Start Gmond using the –m parameter • Shows a list of all available metrics known to Gmond • The python based metric should be in the list • Add the new python metric to a collection group just like any other metric • Restart Gmond
Configuring Gmond for Python • Must load the mod_python.so pluggable module • Must specify a python module path • The ‘params’ directive specifies the python module path • Mod_python will automatically load any .py module found in the specified path • Recommend including the python metric module .pyconf files from within the same .conf file that loads the python support module • Include (‘/etc/ganglia/conf.d/*.pyconf’) modules { module { name = "python_module" path = "/usr/lib/ganglia/modpython.so" params = "/usr/lib/ganglia/python_modules" } }
What’s New and What’s Coming • New metric modules • Track individual CPUs • Track individual logical disks • Track TCP connections and status • Python version of Gmetad • Provides a pluggable module interface • Modules can modify how metrics are stored • Modules can be written to analyze metrics and produce events • Ability to enable/disable modules using a configuration directive • Pluggable web views • Spoofing modules – modules that can report metrics on behalf of another host
General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.
Color Palette Note: The gray dotted-line box represents the margins or “working area” into which all text and most graphics and diagrams should conform. BLUE RED ORANGE RGB 0 166 238 RGB 224 0 0 RGB 230 120 20 How to Add Novell Colors to Your OpenOffice Color Palette: 1. Go to the “Tools” menu 2. Select “Options” 3. Expand “OpenOffice.org” 4. Select “Colors” 5. Delete existing colors (one-by-one) 6. Add Novell Colors by giving them a name and entering RGB values 7. Click “OK” TEAL YELLOW GREEN RGB 50 118 109 RGB 255 221 0 RGB 98 158 31 DK GRAY MD GRAY LT GRAY RGB 60 60 65 RGB 90 90 100 RGB 204 204 205
Graphics & Typeface Flat Bubble 3-D Note: Icons/Lines: This presentation refresh simplifies the current template and pushes focus on the content being presented. The icon library will continue to be utilized, but a refresh will be noticeable with the addition of the “Bubble” set of icons, and a subtle color shift. These icons are created to provide a professional, consistent look. When these icons are used sparingly, and in direct relation to the content on the slides, our presentations will communicate and work more effectively. RED RED ORANGE ORANGE GREEN GREEN BLUE BLUE Typeface: Arial has been selected as the new typeface for all Novell communications. The following were considered. 1. Our typeface needs to be designed to carry information quickly to the reader. 2. It needs to be usable for Novell employees in company correspondence and presentations, as well as for outside vendors for marketing and promotion. 3. It needs to easily function on the Linux, Windows and Macintosh platforms. 4. And finally, Arial was created for these exact purposes. GRAY GRAY Download Icon Library at: http://innerweb.novell.com/brandguide How to Add Novell Icons to OpenOffice Gallery: 1. Go to the “Tools” menu 2. Select “Gallery” 3. In the Gallery window select “New Theme...” 4. With the “General” tab active name your new theme (ie.Red flat) 5. Select the “Files” tab. 6. Select “Find Files...” 7. Find the downloaded folder containing the icons named and click “Select” 8. Select “Add All” and then “OK” 9. Repeat for all icon groups