1 / 25

Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer –

Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer – Marcelo Perazolo Autonomic Computing Architecture mperazolo@us.ibm.com Abdi Salahshour Autonomic Computing Technology & Development abdis@us.ibm.com

bernad
Download Presentation

Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer –

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Detection and Visualizationof Problem Patterns–a “Simplified” Symptomatic Event Vizualizer – • Marcelo Perazolo • Autonomic Computing • Architecture • mperazolo@us.ibm.com Abdi Salahshour Autonomic Computing Technology & Development abdis@us.ibm.com April 25-26, 2006

  2. Agenda • Statement of Problem • What is the Common Event Format • What is the Symptoms Reference Format • A Solution • Conclusion • Helpful Links

  3. Problems Facing Today's Data Collection • Complexity of e-Business • Collection of distributed and heterogeneous software and hardware components • Variety of Data and Collectors/Adapters • Consume and publish proprietary data formats • Require ad hoc and product specifics code • Data format and APIs • Design and Standards considerations • Different skills set to configure, maintain, and tune • Difficult to correlate for e2e problem diagnostics • Instrumentation • Many-to-Many • Standards compliance • Customer pain and cost of ownership

  4. [ibm][db2][jcc][t4] 0150 0400162110E2C1D4 D7D3C5F140404040 ...!........@@@@ .....SAMPLE1 [ibm][db2][jcc][t4] 0160 4040404040404000 59D0030003005324 @@@@@@@.Y.....S$ ..}...... [ibm][db2][jcc][t4] 0170 0800640000003032 30303053514C5249 ..d...02000SQLRI .............<.. [ibm][db2][jcc][t4] 0180 4558540001000480 0100000000000000 EXT............. ................ [ibm][db2][jcc][t4] 0190 0000000000000000 0000000020202020 ............ ................ [ibm][db2][jcc][t4] 01A0 2020202020202000 1253414D504C4531 ..SAMPLE1 ...........(&<.. [ibm][db2][jcc][t4] 01B0 2020202020202020 20202000000000FF ..... ................ [ibm][db2][jcc][t4] [ibm][db2][jcc][ResultSetMetaData@108ac50a] BEGIN TRACE_RESULT_SET_META_DATA [ibm][db2][jcc][ResultSetMetaData@108ac50a] Result set meta data for statement Statement@2b2cc50a [ibm][db2][jcc][ResultSetMetaData@108ac50a] Number of result set columns: 1 isDescribed=true[ibm][db2][jcc][ResultSetMetaData@108ac50a] Column 1: { label=BALANCE, name=BALANCE, type name=DECIMAL, type=3, nullable=1, precision=9, scale=2, schema name=TEST , table name=ACCOUNTS, writable=false, sqlPrecision=9, sqlScale=2, sqlLength=0, sqlType=485, sqlCcsid=0, sqlName=BALANCE, sqlLabel=null, sqlUnnamed=0, sqlComment=null, sqludtxType=<null>, sqludtRdb=<null>, sqludtSchema=<null>, sqludtName=<null>, sqlxKeymem=0, sqlxGenerated=0, sqlxParmmode=0, sqlxCorname=ACCOUNTS, sqlxName=BALANCE, sqlxBasename=ACCOUNTS, sqlxUpdatable=0, sqlxSchema=TEST , sqlxRdbnam=SAMPLE1, internal type=3, is locator parameter=false } [ibm][db2][jcc][ResultSetMetaData@108ac50a] { sqldHold=0, sqldReturn=0, sqldScroll=0, sqldSensitive=0, sqldFcode=85, sqldKeytype=0, Event Logging source=com.ibm.ws.rsadapter.spi.WSRdbDataSource org=IBM prod=WebSphere component=Application Server <init> [11/25/03 14:14:33:695 EST] 42754514 > UOW= source=com.ibm.ws.rsadapter.DSConfigurationHelper org=IBM prod=WebSphere component=Application Server createDataStoreHelper parm1=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper parm2={} [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.GenericDataStoreHelper org=IBM prod=WebSphere component=Application Server init parm1=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper@2128451b [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.DataStoreHelperMetaData org=IBM prod=WebSphere component=Application Server setGetTypeMapSupport: false [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.DataStoreHelperMetaData org=IBM prod=WebSphere component=Application Server setHelperType: 0 [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper org=IBM prod=WebSphere component=Application Server the cloudscape metadata is : parm1= The defaultTransactionIsolation is: 2 The supportsExtendedForUpdate is: false The supportsKerberos is: false The supportsSelectForUpdate is: true The supportsGetCatalog is: true The supportsGetTypeMap is: false The supportsIsReadOnly is: true The supporstMultiplePartitionDB is: false Applications Database Application Servers Servers Storage devices Networks Proprietary format

  5. Problem determination may take days or weeks Blame Storming Blame Storming Syndrome • Proprietary log format • Domain specific set of tools • No interfaces between tools • Siloed problem determination • Finger pointing resolution Applications Database Application Servers Servers Storage devices Networks Proprietary format Specialized skills and tools

  6. Common Base Event (CBE) / WSDM Event Format (WEF) • Richer and normalized data enables cross-product analysis & correlation; is a prerequisite to effective root cause analysis and automation • Without standards the event data are of little value to autonomic management in problem determination and action in response • To alleviate this event data are structured in 4 categories • The identification of the component that is affectedby or experienced the situation • This is also known as the source of a situation • The identification of the component that is reporting the situation • This is also known as the reporter of a situation • It may be the same as the source component of the situation • The situation data • Properties or attributes that describes the situations • The Context/Correlation data • Properties or attributes to correlate the situations with others • CBE / WEF • A consistent specification for the definition of normalized event and log information for various domains (business, security, network, system, etc.) • An exchange format for events and logs • Describe situations about the external operational capabilities of the component. • data that captures execution information within a component (i.e. trace), which CBE/WEF is not positioned for • Context Data

  7. What is a Symptom? • Dictionary definition:“A characteristic sign or indication of the existence of something else.” • AC definition:“A characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources.” • A form of knowledge, used to solve problems and situations automatically in an autonomic system. • Symptoms are composite records of information, formed by the combination of raw or composite information into patterns • Symptoms may be composed of other symptoms as well

  8. From Events to Symptoms • Event: an indication of something being monitored • For example, memory usage has exceeded a set limit • Symptom: a characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources • Symptom: If event x (and y (and…) ) occur (under certain conditions), then report the occurrence and possible resolution actions • For example, memory usage has exceeded a set limit three times in a 10-minute stretch: suggest increasing your buffer sizes

  9. Symptoms Reference Architecture schema: <schema used to create a new instance of the symptom> metadata: <schema used to index and categorize all forms of knowledge> Policy Change Req Change Plan Analyze Plan Symptom Knowledge SymptomDefinition Monitor Execute Event rule effect: <schema that describes how to react to instances of the symptom> rule: <schema used to recognize a symptom instance> instance engine deploy engine: <a runtime artifact used to produce symptom instances> instance: <an instance of this symptom that conforms to the symptom schema> SymptomCatalog

  10. The Value Proposition • Management Data more consumable to end-user • Visualization of product symptoms within problem determination tooling • Symptoms are more deterministic than individual events • Increased customer satisfaction • Reduced problem determination costs • Administrators use automated event correlation to recognize symptoms (and potentially, corrective actions) • Support personnel access symptoms directly from the problem determination tools • Cross-product symptom catalogs allow quick diagnosis for known errors • Reduced maintenance costs • Incremental improvements to symptom databases will reduce requests to L2 and L3 support • Reduced support requests from other IBM organizations • Standard symptom format allows products to leverage problem resolution cost from other IBM organizations (e.g. Collaboration Center)

  11. One Tool Does Not Fit All! Advanced Developers LTA-eclipse LTA-portal Change Team Correlation Support Engineers System Analysts LTA-JD Analysis Operators Triage Basic (e.g. operators) Advanced (e.g. developers) Simple User Skills

  12. “Simple“ Log and Trace Analyzer for Java Desktop • Standalone simple Java event viewer to merge, filter, sort, and display contents of event sources in a common event format (i.e., CBE) for problem isolation and triage to problem analysis • Enables end-to-end viewing of event sources across the heterogeneous environment • Customizable summary view • Ability to select and expand any raw from the summary view to display the full CBE attributes • Correlate on timestamp and/or sorting on any Common Base Event property • Filtering and multi level sorting of any event properties • Custom highlighting of triage events (simple symptoms definition) • Save and share configuration settings (import/export) • Staring point for Support personnel and Operation staff • Springboard to more advanced analyzer tools

  13. Overall Architecture Fast XPath Process CBE CBE Event Sources Visual Filters • FastXPath • Integrates solution with existing code generation tools • Extracts XML schema-specific metadata from the object it queries • Uses metadata available in auto-generated classes to build optimized XSL engines

  14. Event sources collection Customizable Results/Summary area Events detail area

  15. = Equivalent toSymptom Rules This filter is by Creation Time using XPath that can be generated by the Filter Builder

  16. Filter Builder (Novice Users) Powerful composition dialogs… … while still showing full XPath syntax for power users

  17. = We associate visualization attributes to Symptom Rules

  18. 1 2 3 4 5

  19. Flexibility to show only what the user wants to see: filters out the non-participating events

  20. Symptom details (description of the problem) show up when hovering over the highlighted events

  21. Helpful Links • Autonomic Computing Enablement Site • http://acenablement.raleigh.ibm.com/ • http://acenablement.raleigh.ibm.com/html/technology/pd/pddwnlds.html • Autonomic Computing • http://www.ibm.com/autonomic • Autonomic Computing Toolkit • http://www.ibm.com/developerworks/autonomic • Autonomic Computing Toolkit Download • http://www-106.ibm.com/developerworks/autonomic/probdet1.html • Common Base Event Version V1.0.1 (CBE) • http://dev.eclipse.org/viewcvs/indextools.cgi/~checkout~/hyades-home/docs/components/common_base_event/cbe101spec/CommonBaseEvent_SituationData_V1.0.1.pdf • WSDM Event Format V1.0 (WEF) • PART 1: http://docs.oasis-open.org/wsdm/2004/12/muws/cd-wsdm-muws-part1-1.0.pdf • PART 2: http://docs.oasis-open.org/wsdm/2004/12/muws/cd-wsdm-muws-part2-1.0.pdf • Common Event Infrastructure (CEI) • http://www.ibm.com/software/tivoli/features/cei/ • http://www-106.ibm.com/developerworks/library-combined/ac-cei

  22. Backups

  23. CBE Object ACT/XPath CEI/ESB CBE Logs XPath CBE Logs Import CBE Logs CBE XML Formatted Logs SymptomDB SymptomDB SymptomDB Solution Problem Isolation & Analysis Product Problem Isolation & Analysis Solution Problem Isolation Solution Problem Analysis Use Cases LTA-Eclipse (Correlate/Analyze) • Event viewing • Merge/sort/filter • Event correlation • Cross-Event analysis (symptoms) • Remote/local data collection • Event conversion CBE XML Log and Trace Analyzer Tools Retrieve and Analyze CBE Log Data RAC (API) CBE Events LTA-JD (Triage) LTA-JD (Analyze) Generic Log Adapters (GLA) Triaged CBE Events LTA-Portal (Correlate/Analyze) • Event viewing • Merge/sort/filter • Event correlation • Cross-Event analysis (symptoms) • Remote/local data collection • Event conversion CBE XML Formatted Logs • Event viewing • Merge/sort/filter • Single Event Analysis (highlighting/simple symptom rules) • local data collection • Remote data collection from CEI server Applications

  24. LTA-JD Performance • Evaluation of LTA-JD end-to-end (xml input – convert & process object - filter – display) • Evaluation of simple FastXPath expression • /CommonBaseEvent[@severity >= '10'] on 100000 CBEs • FastXPath (157millisecs), JXPath (468 millisecs), Xalan (1328 secs) • Better results with • smarter filters • bigger JVM heap • IBM JDK 1.5 (~ 60% improvement !!!)

More Related