On Demand and Autonomic Computing

On Demand andAutonomic Computing Steve R. White Senior Manager, Autonomic Computing Thomas J. Watson Research Laboratory

Outline • Background and motivation • Research in autonomic components and systems • Autonomic computing architecture • Research in structured autonomic systems

On Demand Era

Complex heterogeneous infrastructures are a reality!

Motivation • Administration of individual systems is increasingly difficult • 100s of configuration, tuning parameters for databases, Web application servers, storage, … • Heterogeneous systems are becoming increasingly connected • Integration becoming ever more difficult • Architects can't intricately plan interactions among components • Increasingly dynamic; more frequently with unanticipated components • More of the burden must be assumed at run time • But human system administrators can't assume the burden • 6:1 cost ratio between storage administration and storage • 40% outages due to operator error • We need self-managing computing systems • Behavior specified by system administrators via high-level policies • System and its components figure out how to carry out policies

Autonomic Self-Management Increase Responsiveness Adapt to dynamically changing environments Business Resiliency Discover, diagnose, act to prevent disruptions Operational Efficiency Tune resources, balance workloads to best use IT resources Secure Information & Resources Anticipate, detect, identify, deter attacks

Evolving to Autonomic Computing Basic Level 1 Managed Level 2 Predictive Level 3 Autonomic Level 5 Adaptive Level 4 Multiple sources of system generated data Data & actions consolidated through mgt tools Sys monitors correlates & recommends actions Sys monitors correlates & takes action Components dynamically respond to bus policies Characteristics Extensive, highly skilled IT staff IT staff approves & initiates actions IT staff focuses on enabling business needs IT staff analyzes & takes actions IT staff manages performance against SLAs Skills Greater system awareness Improved productivity Less need for deep skills Faster/better decision making Human/system interaction IT agility & resiliency Business policy drives IT mgt Business agility and resiliency Basic Requirements Met Benefits Autonomic Manual

Human Interaction with Autonomic SystemsP. Maglio, Almaden We start with looking at the proxy server log files, then the web server log files, then the application server admin log files then the application log files. • Basic questions • What do middleware administrators do? • How can we better support the problems and practices they have? • Learn answers to these questions via ethnographic studies • Use insights to design new ways to interact with complex computing systems We had it wrong. Our assumption of how it worked was incorrect. … but we thought that was the return port!

Few minutes later… Dynamic Surge ProtectionJ. Hellerstein, Watson • Systems can go from steady state … Internet • to overloaded without warning

Surge Protection Demo Monitor & remove servers #Active Servers #Requested Servers Actual BOPS Predicted BOPS Response Time

Enterprise Workload ManagementD. Dillenberger, Watson Large, distributed, heterogeneous system • Achieves end-to-end performance via adaptive algorithms • Administrator defines policy • Desired response times for various classes of users, apps • eWLM managers on each resource cooperate to adaptively tune parameters • OS, network, storage, virtual server knobs • JVM heap size, # garbage collection threads • Workload balancing, routing parameters

Policies and Autonomic ComputingD. Verma and D. Kandlur, Watson • Policy: Set of guidelines or directives provided to autonomic element to influence its behavior. • Key Challenge: • Move away from low level controls • Move towards high level directives (policies) over autonomic decisions • Developing scenarios, standards and technologies to support policies for autonomic computing

Utility Functions and Autonomic ComputingW. Walsh, Watson • Utility functions can guide autonomic decision making • Self-optimization: natural way to express optimization criteria • Declarative: preferable to implicitly hard-coded in special purpose algorithms • Derivable from business objectives (e.g. optimize total profits) • Can translate to computing metrics at different levels • Exploring applications in eWLM, eUtility, SLEDS V(RT) Response time RT Utility function

Analyze Plan Monitor Execute Knowledge S E Autonomic Computing ArchitectureThe Autonomic Element • AE is the fundamental abstraction • Defines an important boundary • An AE contains • Exactly oneautonomic manager • Zero or more managed element(s) • Could be basic resource like database, storage system, server, software app • Higher level elements may have no managed element; they manage other autonomic elements via messages • AE is responsible for • Providing/consuming computational services • Interacting with other autonomic elements • Managing own behavior in accordance with policies Autonomic Manager Managed Element An Autonomic Element E.g. Database, storage, server, software app, workload mgr, sentinel, arbiter, OGSA infrastructure elements An Autonomic Element

Autonomic Computing ArchitectureElement interactions • Based on OGSA; extensions as necessary • Service-oriented architecture • Messages defined by WSDL: portTypes, operations • Services defined by constellations of portTypes • AC architecture defines: • Required messages • Optional but standard messages • For advanced interactions: conversation support • “Choreography” defines structure of multi-step interactions • Runtime enforces conversational protocols for app logic. • Underlies robust interactions

Autonomic Manager Analyze Plan Monitor Execute Knowledge Managed Element S S E E An Autonomic Element An Autonomic Element Autonomic Manager ToolsetW. Arnold et al., Watson • Facilitates autonomic manager construction • In accordance with AC architecture • Catcher for generic AM technologies • OGSA messaging • Policy tools • Monitoring technologies • AI tools for knowledge representation, reasoning • Math libraries for modeling, analysis, planning • Feedback control • V1.0 now available on alphaWorks • Part of the Exploratory Technology Toolkit • www.alphaworks.ibm.com

User Interface OGSA Registry Register Register Database Storage Autonomic Computing SystemsA small-scale system prototype Policy Repository

User Interface FetchPolicy, Subscribe(Policy) ReportPolicy OGSA Registry FindServiceData (Policy Repository) Policy Repository Database Storage Autonomic Computing SystemsA small-scale system prototype

User Interface Publish(Policy) OGSA Registry ReportPolicy SetPolicy Policy Repository Database Storage Service Class Definition Alert Policy Autonomic Computing SystemsA small-scale system prototype

User Interface OGSA Registry DeliverResource(LV Name) FindServiceData (Storage) QueryResponse (List(Storage)) Policy Repository AddResource(LV, Parms) Create TableSpace Database Autonomic Computing SystemsA small-scale system prototype Alert Policies Svc Class Defs Storage

Resource Managers (e.g. Storage, DB, Servers) eUtility Manager Resource Arbiter Network Database Application Manager Workload Manager Application Manager Predictor Workload Manager Storage Server Policy Repository Sentinel Registry Database Server Network Storage Application Environment 1 Autonomic Computing SystemsFlexibly composed from autonomic elements Large Autonomic System Application Environment 2

Workshops • First Workshop on Algorithms and Architectures for Self-Managing Systems (at FCRC ’03) • June 11, 2003 in San Diego, CA • 5th Annual International Conference on Active Middleware Services: Autonomic Computing Workshop • June 25, 2003 in Seattle, WA • IJCAI-03 AI and Autonomic Computing: Developing a Research Agenda for Self Managing Computer Systems • August 10, 2003 in Acapulco, Mexico • First International Workshop Autonomic Computing Systems at 14th International Conference on Database and Expert Systems Applications (DEXA'2003) • 1-5 September, 2003 in Prague, Czech Republic • 14th IFIP/IEEE International Workshop on Distributed Systems: Operations & Management (DSOM-03) • October 20-22, 2003 in Heidelberg, Germany

References • The Vision of Autonomic Computing • IEEE Computer, January 2003 • http://computer.org/computer/homepage/0103/Kephart/ • IBM Systems Journal special issue on Autonomic Computing • http://www.research.ibm.com/journal/sj42-1.html

Interesting Research Problems • Architecture • What is the right architecture? • Should we be working on architecture at all? • Policies • Can we really run large IT systems by specifying high-level policies? • Centralized vs. Decentralized Control • Will decentralized control play an important role? • Human Interaction • How will humans interact with large autonomic systems? • How can we express the behavior of a large, dynamic system to humans? • Systems With a Billion Components • Are they even possible?

On Demand and Autonomic Computing

On Demand and Autonomic Computing

Presentation Transcript

Autonomic (Grid) Computing Introduction, Motivations, Overview

The Vision of Autonomic Computing

IBM Autonomic Computing and Solution Installation

Autonomic Computing

Autonomic Computing and Networking

IEEE International Conference on Autonomic Computing (ICAC’06)

Autonomic Computing: Model, Architecture, Infrastructure

Autonomic Computing

Chapter 8: Autonomic computing

IBM Initiatives in Autonomic Computing

Autonomic Computing

Autonomic Computing

On Demand computing services

On Demand computing services

AUTONOMIC COMPUTING

THE VISION OF AUTONOMIC COMPUTING

Research Challenges in Autonomic Computing

Engineering Decentralized Autonomic Computing Systems

Cloud Intrusion and Autonomic Management in Autonomic Cloud Computing

Autonomic Computing

Autonomic Computing

THE VISION OF AUTONOMIC COMPUTING