170 likes | 320 Views
Bionic Autonomic Nervous System and Self-Healing for NASA ANTS-Like Missions. Michael Hinchey, Yuan-Shun Dai, James L. Rash, Walt Truszkowski, Manish Madhusoodan. Presented by Chuck Cartledge. What we are going to cover. What this paper is addressing Constraints on a specific problem
E N D
Bionic Autonomic Nervous System and Self-Healing for NASA ANTS-Like Missions Michael Hinchey, Yuan-Shun Dai, James L. Rash, Walt Truszkowski, Manish Madhusoodan Presented by Chuck Cartledge
What we are going to cover • What this paper is addressing • Constraints on a specific problem • Self-Healing requirements • A description of their approach • Results of their simulations • Prospects for the future • Wrap up CS-7/875 Distributed Systems, ODU, Spring 2008
What is the problem this paper is addressing? • Introduce characteristics of autonomic systems • Focus on one aspect of those systems • Propose a method to meet the requirements of that aspect • Demonstrate via simulation that method in selected failures • Convince others that the effort is worth pursuing • Blaze a way for fame and fortune for “everyone” CS-7/875 Distributed Systems, ODU, Spring 2008
A vocabulary • Autonomic systems • Self Configuring • Self Healing • Self Optimizing • Self Protecting • ANTS (not as in Dr. Feynman’s bathub ants) - Autonomous Nano-Technology Swarm • BANS – Bionic Autonomic Nervous System • PAM – Prospecting Asteroid Mission CS-7/875 Distributed Systems, ODU, Spring 2008
Some of the problem constraints • Time • Distance • Power • Failures • Survival • Results CS-7/875 Distributed Systems, ODU, Spring 2008
Focus on a single aspect of the problem • Autonomic system approach • Self healing • Problem detection • Problem containment • Problem remediation CS-7/875 Distributed Systems, ODU, Spring 2008
Self-healing functionality based on BANS (vue d'ensemble) • Management of misbehaving software and system components • Self – healing has several distinct facets • Autonomous Diagnosis • Consequence Oriented Prescription • Autonomous Curing CS-7/875 Distributed Systems, ODU, Spring 2008
Self-healing functionality based on BANS (partie une de trois) • Autonomous Diagnosis • Diagnosis being conducted in ANT-Worker and CNS • Simpler in ANT-Worker while keeping CNS informed, more complex in CNS to assist ANT-Worker • Cyber neuron controlling cyber axon(s) to get further data • Based on “simple” cyber neuron or “complex” CNS diagnosis, a probable diagnosis is made CS-7/875 Distributed Systems, ODU, Spring 2008
Self-healing functionality based on BANS (partie deux de trois) • Consequence Oriented Prescription • Traditional approach is to stop the software, recompile and restart • Traditional approach too catastrophic for many errors • Employ consequence-oriented prescription approach • Neither stop, nor recompile • Not reboot host to resolve small problems • Maintain health of host without downtime in real-time systems • Overcome heterogeneity of errors in different programs • Healing methods can be made generic based on expected consequences CS-7/875 Distributed Systems, ODU, Spring 2008
Self-healing functionality based on BANS (partie trois de trois) • Autonomous Curing • A “prescription” based on the diagnosis is drawn from database of “diseases” • Prescription is applied • Results are monitored • If problem is cured then “No problem” • If problem is not cured then re-diagnosis, identify prescription and repeat CS-7/875 Distributed Systems, ODU, Spring 2008
What have the authors simulated – A memory leak scenario • Without BANS • Memory loss due to leak continues to grow and assumed that the processor is compromised • Compromised processor can lead to cascading errors and possible loss of ANT-Worker or even greater losses • With BANS • Memory leak detected around time 30 • Offending process suspended for 10 seconds during analysis • Offending process reactivated at time 40 • Offending process terminated at time 60 • BANS identified a problem, tried one corrective action (which failed) implemented a second action that contained the problem CS-7/875 Distributed Systems, ODU, Spring 2008
What have the authors simulated – A speed control scenario • Speed control program written on Earth for a single ANT • When activated there is likely to be multiple ANTs in the swarm • Without BANS • ANT continues to accelerate without considering other ANTs • ANT reaches speeds that endanger other ANTs • With BANS • CNS “knows” other ANTs in the area • At time 30 starts evaluation of the swarm and reduces acceleration value to 0 • At time 40 sets acceleration value to a positive value • At time 42 sets acceleration to a negative value (puts the brakes on) • At time 45 set acceleration to a positive value (cruise control) • BANS indentified a problem, tried one correction that failed, tried a second that succeeded CS-7/875 Distributed Systems, ODU, Spring 2008
Where they want to go • They think that BANS is the way to go • They want to build operational systems (not just simulations) • They want to investigate the problem of prescriptions • System heterogeneity • Problem of diagnosis • Problem of prescriptions • Real ones • Expansion of current ideas CS-7/875 Distributed Systems, ODU, Spring 2008
What we’ve covered • Brief discussion about the characteristics of autonomic systems • Characteristics of a Self-healing prototype application • Results of simulations showing how a memory leak and a run away ANT would be controlled via a proposed autonomic system • The author’s desire to field an operational system • PAMS may be operational in 2025 with results returned to earth by 2030 CS-7/875 Distributed Systems, ODU, Spring 2008
Questions and comments CS-7/875 Distributed Systems, ODU, Spring 2008
A partial author CV http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/index.html CS-7/875 Distributed Systems, ODU, Spring 2008
References and links to interesting things • Dr. Richard Feynman (he did more than play with ants): http://en.wikipedia.org/wiki/Richard_Feynman • Joseph Louis Lagrange (how attempting to solve an almost intractable problem can still result in a name in history): http://en.wikipedia.org/wiki/Joseph_Louis_Lagrange • NASA’s ANT home page (general information about ANTs):http://ants.gsfc.nasa.go • PAM’s home page (interesting descriptions and animations, check out how the PAM ANTs will be built): http://ants.gsfc.nasa.gov/pam.html • This paper: http://portal.acm.org.proxy.lib.odu.edu/citation.cfm?doid=1244002.1244025 CS-7/875 Distributed Systems, ODU, Spring 2008