Distributed Services Part One

Lecture on Distributed ServicesPart One Prof. Walter Kriha Computer Science and Media Faculty HdM Stuttgart

Overview • Distributed Services at Amazon and Google • Distributed Services (Naming, Finding, Live Cycle, Relationships, Notification, Concurrency) • Functional and Non-Functional Aspects of DS • Services, Lose Coupling and Aspects • Remote Locking Strategies

What is a Distributed Service? A distributed middleware that provides a certain function with: - high scalability - high availability through redundant parts - high throughput und performance To Distributed Applications

Distributed Services at Amazon “The big architectural change that Amazon went through in the past five years was to move from a two-tier monolith to a fully-distributed, decentralized, services platform serving many different applications. “ Werner Vogels, Amazon CTO, At the core of the Amazon strategy are the Web Services. The Amazon team takes the concepts of search, storage, lookup and management the data and turns them into pay-per-fetch and pay-per-space web services. This is a brilliant strategy and Amazon is certainly a visionary company. But what impresses me the most as an engineer is their ability to take very complex problems, solve them and then shrink wrap their solutions with a simple and elegant API. Alex Iskold, SOAWorld Mag.

Amazon Service Arc. A.Iskold, SOAWorld

Amazon Service Arc. http://www.infoq.com/news/2009/02/Cloud-Architectures

Amazon Client Facing Services • Storage Services (S3, huge storage capacity) • Computation Services (EC2, Virtual Machines) • Queuing Services • Load-balancing services • Elastic map reduce • Cloudfront (memcache like fast cache) • And so on....

Services: Best Practice • Keep services independent (see Playfish: a game is a service) • Measure services • Define SLAs for services • Allow agile development of services • Allow hundreds of services and aggregate them on special servers • Stay away from middleware and frameworks which force their patterns on you • Keep teams small and organized around services • Manage dependencies carefully • Create APIs to offer your services to customers more: www.highscalability.com, Inverview with W.Vogels

Distributed Services at Google • - Scheduling Service • - Map/Reduce Execution Environment • - F1 NewSQL DB • - Spanner Replicated DB • - BigTable NoSQL Storage • - Distributed File System • - Chubby Lock Service

Example: GFS What are the non-functionals for a distributed file system?

Core Distributed Services - Finding Things (Name Service, Registry, Search) - Storing Things (all sorts of DBs, data grids, block storage etc.) - Events Handling and asynchronous processing (Queues) - Locking Things and preventing concurrent access (Lock service) - Request scheduling and control (request (de)-multiplexing - Time handling - Providing atomic transactions - Replicating Things - Object handling (Lifecycle services for creation, destruction, relationship service) Etc.

Frequently Used Implementation Pattern Client Master Consensus Protocol with Failure Detection Replicating nodes N+1 or N+2 redundant cluster offering a specific service in a highly available way. Alternatives are totally distributed services with no master.

Distributed Services Hierarchy Security Scheduler Applications (Search etc.) Query Global Replication Services Distributed Table Storage Services Cycles? Distributed File Services Lock Service, Time Service Consensus/Leadership Algorithms Time and Causality Handling, reliable Communication Failure Models

Finding distributed objects : name space design company Zone=name server production test development Hard link alias Soft link alias Factory finder factory Factory finder Factory: /Development/factory Factory finder factory channels channels channels topics queues topics queues topics queues The organization of a company name space is a design and architecture issue. System Management will typically enforce rules and policies and control changes. Different machines will host parts of the name space (zones)

(Non)-functional requirements Definition: the functions of a system that serve e.g. the business DIRECTLY are called “functional requirements” “Non-Functional Requirements”: the functions of a system that are necessary to achieve speed, reliability, availability, security etc. Many programs fulfill the functional requirements but die on the non-functional requirements. The different views of a system are also called „aspects“ today. Aspect-oriented programming (AOP) tries to keep them conceptually separate but „weaves“ them together at or before runtime execution.

functional requirements of a naming service • re-bind/resolve methods to store name/value pairs. • Provide query interface • Support for name aliases to allow several logical hierarchies (name space forms a DAG) • Support for composite names (“path names”) • Support location – independence of resources by separating address and location of objects. A naming service offers access to many objects based on names. This requires a sound security system to be in place. Is it a clever system design to offer lots of services/objects only to require a grand access control scheme?

Non-functional requirements of a name service • Persistent name/value combinations (don’t lose mapping in a crash) • Transacted: Manipulation of a naming service often involves several entries (e.g. during application installation) This needs to happen in an all-or-nothing way. • Federated Naming Services: transparently combine different naming zones into one logical name service. • Fault-Tolerance: Use replication to guarantee availability on different levels of the name space. Avoid inconsistencies caused by caching and replication. • Speed: Guarantee fast look-up through clustering etc. Writes can be slower. Client side caching support. Reduce communication costs. A naming service is a core component of every distributed system. It is easily a Single-point-of-failure able to stop the whole system.

A fault-tolerant JNDI name service I From: Wang Yu, uncover the hood of J2EE Clustering, http://www.theserverside.com/tt/articles/article.tss?l=J2EEClustering

A fault-tolerant JNDI name service II From: Wang Yu, uncover the hood of J2EE Clustering, http://www.theserverside.com/tt/articles/article.tss?l=J2EEClustering

Name Space Distribution (Federation) An example partitioning of the DNS name space, including Internet-accessible files, into three layers.From: van Steen/Tanenbaum, Chapter on Naming. Federation is when systems communicate ABOUT their data instead of just replicating/copying everything. Availability! Iterative lookup, Caching policy A Company level Speed! Recursive lookup Caching policy B

Examples of Naming Services • Domain Name System (DNS) • X.500 Directory • Lightweight Directory Access Protocol (LDAP) • CORBA Naming Service • Java Registry • J2EE JNDI (mapped to CORBA Naming Service)

Finding distributed objects : no service case Find, query offers and make deal Seller A: Beans, green buyer Seller B Beans, yellow Seller C Apples, green Without intermediate helpers a client needs to FIND the proper server, QUERY the offer and DEAL with a seller. This is tedious!

Finding distributed objects : directory service Register products query offers and find dealer Seller A: Beans, green Directory: Beans,green, Seller A Beans, yellow, Seller B Apples,green, Seller C buyer Seller B Bohnen, gelb Seller C Apples, green Make deal with dealer buyer Seller B Bohnen, gelb A directory allows a buyer to find products and dealers quickly. Still, the buyer needs to understand the offer(s) – possibly all different with respect to language and quality-of-service – and deal with a selected dealer

Finding distributed objects : trading service Register, withdraw or change offers Send query to trader Seller A: Beans, green Trader: match query Against offers. Return Matching offer. Offer A Offer B Offer C Bohnen, gelb Seller B Bohnen, gelb buyer Seller C Apples, green Federated traders More Traders A trader takes a service specification from an “importer” and matches it against service offers that “exporters” have registered. The trader offers a service definition language to formulate queries and offers. An offer can be changeable or not. Trader can also translate between different languages and offer a constraint definition language. But a trader will NOT perform the deal itself (like a broker does). Federation problems: duplicated offers and looping requests.

Finding distributed objects: query service Find all accounts with name “maier” and balance <200) Query definition language (e.g. SQL, OQL, XQL) database Query processor query client 1 Kriha, 10.000 2 Maier, 15000 result A characteristic of a query is that we expect to get ALL the information back that matches our constraints. And we can use expressions that will be evaluated by the storage component. Compare this with “search” on the next slide.

Problems of object query services Find all person objects with person.name()==“maier” and account.balance()>200) Object query language Query processor client query 1 Kriha, 10.000 2 Maier, 15000 Result objects • will the query processor INSTANTIATE all objects to perform the expressions person.name() and account.balance? This is VERY SLOW. Fix: “query push-down” (translate OQL into SQL and let the DB do its job). This requires the query processor to know which methods are side-effect free! • Can the results be also in DATA record form or must it be objects? Again, performance is better with data. But then: How is security preserved in the data case? (assuming an J2EE method level security in place?). Be sure to define your security on a meta-level and derive both data and object level security from it. • Can lazy evaluation used with result object creation (e.g. only when really used by the client) • How long are results VALID (cursor stability) after transactions?

Discovery Services and Semantics The dream of discovery is that clients and providers are “loosely coupled” – clients can evaluate services and decide which ones they will use. This core paradigm of WebServices and Service-Oriented Architectures (SOA) requires much more than interface specifications. It requires semantic descriptions and ontologies so that clients and services can really understand each other. Do we need translator services which map different vocabularies? See: Lecture on WebServices and SOA later...

Functional requirements of a lifecycle service • Help in the creation of objects • Avoid that clients know object creation details • Help copying objects (across machines? Different Hardware?) • Help migrate objects without breaking clients • Help delete objects if clients do not use them anymore.

Creating distributed objects : lifecycle service Resolve(/test/finders/AccountFinder) client Naming Service Using interfaces (naming service, factoryfinder, factory, account) allows system administrators to hide different versions or implementations from clients. In a remote environment finders also support migration and copy of objects. getAccountFactory() client Account Finder createAccount(4711, Kriha, 0) client Account Factory Credit(100.000) client Account

Factory finders and factories company production test development Factory finder Factory finder factory factory Factory finder factory MyAccount Factory channels MyAcctFinder channels topics queues topics queues Proper configuration of finders and factories allows the creation of individual object nets.

Protect distributed objects : relationships Referential integrity rules person “has” relationship preferences “Person” and “Preferences”. If person is deleted, the preferences should go too. Within the database referential integrity rules protect e.g. containment relationships. We’ve got nothing like this in object space.

Functional Requirements for a Relationship Service • Allow definition of relations between objects without modifying those objects • Allow different types of relations • Allow graphs of relations • Allow the traversal of relationship graphs • Support reference and containment relations More on relationships: W.Emmerich, Engineering Distributed Objects

Relationships: employee leaves company Mail system: Mails Host: Passwords employee File Server: Disk Space Appliacations: passwords Host: Authentications Host: Passwords Human Resource DB Host: Authorizations (Internet Access etc.) House Security: Door access rights Can you make sure that once an employee leaves, ALL her rights are cancelled, her disc-space archived and erased, the databases for authentication etc. are updated, application specific DBs as well? And last but not least that the badge does no longer work? That all equipment has been returned? This are RELATIONS between an employee and resources which need to be expressed in a machine readable form.

Relationship creation Object A Reference to B Object B Programmed relationship Node Object A Role A Relation type Dynamic relationship Role B Node Object B The objects A and B are not aware of any relations defined on them.

Relationships: the good and the bad The good: • Powerful modeling tool for meta-information (e.g. topic-maps) • Helps with creation, migration, copy and deletion of composite objects and maintains referential integrity The bad: • Tends to create many and small server objects • Performance killer (many CORBA vendors did not implement this service for a long time). EJB: supported with local objects only (in same container)

Local Notification: Observer Pattern Observer A: Update(Event) {} Register Observed: Public Register(Observer, Event) Private Notify() { For each observer in List, call observer.update(Event) } Observer B: Update(Event) {} Receive updates Even in the local case these observer implementations have problems: Updates are sent on one thread. If an observer does not return, the whole mechanism stops. If an observer calls back to the observed DURING an update call, a deadlock is easily created. The solution does not scale and is not reliable (if the observer crashes, all registrations are lost). And of course, it does not work remotely because the observer addresses are local.

Distributed Notifications and Events Notification or event channel filters Topic A pull push Receiver On machine B Topic B Sender On machine A Topic C pull push Machine X Various combinations of push and pull models are possible. Receivers can install filters using a constraint language to filter content. This reduces unwanted notifications.

Protect distributed objects : concurrency service Client A calls increment(). Count is 0, temp becomes 0. Client A’s thread has used it’s slice and is preempted. Client B calls increment(). Count is 0, temp becomes 0. Client B adds 1 to temp and writes it back to count. Count is 1. Now comes Client A again. Also adds 1 to temp and writes it back to count. Count is 1 and NOT 2 now. We’ve lost one update. Client A Class counter { Int count = 0; Increment() { Int temp = count; Temp++; Count= temp;} Client B The lost update problem!

Protect distributed objects : lost updates Client A calls increment(). Count is 0, temp becomes 0. Client A’s thread has used it’s slice and is preempted. Client B calls increment(). Count is 0, temp becomes 0. Client B adds 1 to temp and writes it back to count. Count is 1. Now comes Client A again. Also adds 1 to temp and writes it back to count. Count is 1 and NOT 2 now. We’ve lost one update. Client A Class counter { Int count = 0; Increment() { Int temp = count; Temp++; Count= temp;} Client B The lost update problem! Would it help to use count++ ?

Protect distributed objects : inconsistent analysis Client A calls swap(). After storing count1 in temp it is set to count2 (0). Client A’s thread has used it’s slice and is preempted. Client B calls sum(). Count1 is now 0, and count2 is still 0. Client B comes back from sum() with result 0. Now comes Client A again. Writes temp back to count2 . Count2 is now 1 but sum() has reported 0 for both. The analysis of sum() is wrong. Client A Class counter { Int count1 = 1; Int count2 = 0; Swap() { Int temp = count1; Count1=count2; Count2=temp;} Int sum() { Return count1 + count2; } } Client B The inconsistent analysis problem!

Use of locking against concurrent access • Binary locks: e.g. synchronize(object). Will block all clients except of one. • Modal locks (read lock, write lock): Clients who only want to read can get read locks – many concurrent read locks are possible. Binary locks are very simple to use but performance suffers badly because they cannot distinguish between reads and writes.

Lock compatibility matrix Read lock Write lock Read lock Write lock The concurrency service will not allow concurrent locks other than read locks. A write lock will exclude all other locks.

Lock granularity Lock whole DB database Lock table Lock row Besides lock mode the granularity of locks will determine overall throughput. The smaller the better.

Optimistic Locking Client session database Lock row, read it (with timestamp), release lock Use row data copy Write to DB. For all data read acquire locks and compare the timestamps. If one is newer, abort store operation Overall throughput is better because locks are held only a very short time. The timestamp compare logic should be a framework mechanism of the client session objects.

Two-phase locking resource Allocate all locks resource client resource resource client Manipulate data client release all locks A basic requirement for the 2-phase locking protocol is that all locks are allocated first. After the first lock is released NO other locks may be acquired! This will guarantee serializability

Deadlocks Allocate lock for B Allocate lock for A Client A Client B Resource A Resource B Client B Client A Try to allocate lock for B: not granted, held by client B Try to allocate lock for A: not granted, held by client A Deadlocks can be detected (e.g. by a database). To prevent deadlocks, always allocate resource locks IN THE SAME ORDER. Process termination must release all locks held by a process.

Service Context Some services need so-called context information to flow with a call. Two prominent ones are: • Security (needs to “flow” user information, access rights etc.) • Transactions (need to flow information about on-going transactions to participants) This additional information needs to be standardized if different vendor implementations of services should interoperate. Do you know other “context related” design problems?

Next Session: • Persistence • Transactions • Security • Gray/Reuter, Transaction Processing (Chapter on Transaction Models) • Java Data Objects Specification (explains on the first 20 pages the concepts of identity, copy, transactional state and develops a state diagram for objects (stateless vs. statefull, transacted etc.) • Security FAQ on www.faq.org (?), primer on www.rsa.com • Silvano Maffeis, ELEKTRA, a high-available CORBA ORB

Exercises • Map the services to the middleware of your project: How do p2p systems find objects, persist objects etc.? • Why is “mandandentfähigkeit” such a hard problem? • What are the non-functional requirements of a notification service? • Why would count++ not help to prevent the lost update problem? • Why are read locks necessary?

Resources • Understanding LDAP, www.redbooks.ibm.com (free) • www.io.de distributed web search paper • Van Steen/Tanenbaum, Chapter on Naming • Van Steen/Tanenbaum, Chapter on Security (homework for next session) • Martin Fowler et al., UML Distilled (small and nice) Werner Vogels, Amazon Architecure, http://queue.acm.org/detail.cfm?id=1142065 Alex Iskold, SOAWorld on Amazon – the real Web Services Comp. http://soa.sys-con.com/node/262024

Distributed Services Part One

Distributed Services Part One

Presentation Transcript

Part One

Part One

Part One

Part One

Part One

Part One

Part One

Part One

Part One

Part One

Part One

PART ONE

Part One

Part One

Distributed Services Part 2

Part One

PART ONE

Part One

Part One

PART ONE

PART ONE

PART ONE