1 / 23

Making Services Fault Tolerant

Making Services Fault Tolerant. Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engineering Humboldt University Berlin. Outline. Introduction Problem Statement

Download Presentation

Making Services Fault Tolerant

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engineering Humboldt University Berlin

  2. Outline • Introduction • Problem Statement • Methodologies for Web Service Reliability • New Reliable Web Service Paradigm • Road Map for Experiment • Experimental Results and Discussion • Conclusion

  3. Introduction • Service-oriented computing is becoming a reality. • Service-oriented Architectures (SOA) are based on a simple model of roles. • The problems of service dependability, security and timeliness are becoming critical. • We propose experimental settings and offer a roadmap to dependable Web services.

  4. Problem Statement • Fault-tolerant techniques • Replication • Diversity • Replication is one of the efficient ways for providing reliable systems by time or space redundancy. • Increasing the availability of distributed systems • Key components are re-executed or replicated • Protect against hardware malfunctions or transient system faults. • Another efficient technique is design diversity. • By independently designing software systems or services with different programming teams, • Resort in defending against permanent software design faults. • We focus on the analysis of the replication techniques when applied to Web services. • A generic Web service system with spatial as well as temporal replication is proposed and investigated.

  5. Methodologies for reliable Web services -- Redundancy • Spatial redundancy • Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result. • Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state. • Temporal redundancy • Redundant in time

  6. Methodologies for reliable Web services -- Diversity • Protect redundant systems against common-mode failures • With different designs and implementations, common failure modes will probably cause different error effects. • N-version programming, recovery blocks…

  7. Failure Response Stages of Web Services • Fault confinement • Fault detection • Diagnosis • Fail-over • Reconfiguration • Recovery • Restart • Repair • Reintegration

  8. Fault Confinement Offline Online Fault Detection Fault Detection Failover Diagnosis Repair Recovery Reconfiguration Restart Reintegration

  9. Replication Manager 6. Invoke web service Web Service Web service selection algorithm • Create web services • Select primary web • service (PWS) Web Service Web Service IIS Application IIS IIS Database WatchDog Application Application Database Database • Keep check the availability of the PWS • If PWS failed, reselect the PWS. Client 3. Register 9. Update the WSDL Port Application UDDI Database Registry 4. Look up WSDL 5. Get WSDL Proposed Paradigm

  10. Get reply Do not get reply Reselect a primary Web Service RM sends message to the Web Service All Service failed System Fail Map the new address to the WSDL Work Flow of the Replication Manager

  11. Road Map for Experiment Research • Redundancy in time • Redundancy in space • Sequentially • Parallel • Majority voting using N modular redundancy • Diversified version of different services

  12. Experiments • A series of experiments are designed and performed for evaluating the reliability of the Web service, • single service without replication, • single service with retry or reboot and, • service with spatial replication. • We will also perform retry or failover when the Web service is down.

  13. None Retry/ Reboot Failover Both (hybrid) Single service, no retry 0 -- -- -- Single service with retry -- 1 -- -- Single service with reboot -- 2 -- -- Spatial replication -- -- 3 4 Summary of the experiments

  14. Parameters Current setting/metric Request frequency 1 req/min Polling frequency 5 ms Number of replicas 5 Client timeout period for retry 10 s Failure rate λ # failures/hour Load (profile of the program) % or load function Reboot time 10 min Failover time 1 s Parameters of the Experiments

  15. Experiments over 360 hour periods (43200 reqs) Number of failures Normal Number of failures Server busy Number of failures Server reboots periodically Exp 0 4928 6130 6492 Exp 1 2210 2327 2658 Exp 2 2561 3160 3323 Exp 3 1324 1711 1658 Exp 4 1089 1148 1325 Experimental Results Retry 11.97% to 4.93% Reboot 11.97% to 6.44% Failover 11.97% to 3.56% Retry and Failover 11.97% to 2.59%

  16. Number of failure when the server is is normal situation

  17. Number of failure when the server is busy

  18. Number of failure when the server reboots periodically

  19. Reliability of the system over time

  20. Reliability Model

  21. ID Description Value λn Network failure rate 0.02 λ* Web service failure rate 0.228 λ1 Resource problem rate 0.142 λ2 Entry point failure rate 0.150 μ* Web service repair rate 0.286 μ1 Resource problem repair rate 0.979 μ2 Entry point failure repair rate 0.979 C1 Probability that the RM responds on time 0.9 C2 Probability that the server reboots successfully 0.9 Reliability Model Parameters

  22. Outcome (SHARPE) Reliability of the proposed system Failure Rate 0.228 0.114 0.057

  23. Conclusion • Surveyed replication and design diversity techniques for reliable services. • Proposed a hybrid approach to improving the availability of Web services. • Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system. • N-Version Programming may finally become commercially viable in service environment.

More Related