720 likes | 798 Views
Failure Handling in a modal Language. Nels Eric Beckman Research Talk Institute for Software Research October 30, 2006. Claims Made in this Talk. ML5 is an elegant language for programming distributed systems. In the face of node failure, the meaning of ML5 programs becomes unclear.
E N D
Failure Handling in a modal Language Nels Eric Beckman Research Talk Institute for Software Research October 30, 2006
Claims Made in this Talk • ML5 is an elegant language for programming distributed systems. • In the face of node failure, the meaning of ML5 programs becomes unclear. • We propose extensions to ML5 that makes their meaning clear. • (In reality, this research is a work in progress.) Failure Handling in a Modal Language ISR
ML5 • A Programming Language for Distributed Systems • Based on a Modal Logic • i.e. A Logic With an Embedded Notion of Place • Tom Murphy’s Thesis Work • Targeted for Grid Programming Failure Handling in a Modal Language ISR
ML5, Briefly... • Allows Hosts to Send ‘Thunks’ to One Another for Execution • In practice, code can be more cleanly decomposed. • Has An Advanced Type System • Location-specific resources can be typed as so. Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r rpc “b” return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r ret x return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r ret x return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC RPC-Style Distributed Programming fun a = fun b = rpc(“b”,19.x.x.x) + r ret x return x; Host Active thread Blocked thread Message Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
PC ML5 Illustration Host Location of thread Migration of thread Failure Handling in a Modal Language ISR
Example • Remotely Finding List’s Sum (RPC) Server Code: class ListServ { List<Integer> myList = new ... List<Integer> getList() { return myList; } } Failure Handling in a Modal Language ISR
Example • Remotely Finding List’s Sum (RPC) Client Code: class ListClient { ListServerStub myServ = new ... public void foo() { List<Integer> list = myServ.getList(); for(Integer item: list) { count+= item.intValue(); } if( count >= 40 ) ... }} Failure Handling in a Modal Language ISR
Example • Remotely Finding List’s Sum (RPC) • To Fix Should We: • Add a new server operation that returns true if a list’s sum is greater than 40? • Weird if operation is only used once. • We wouldn’t structure application this way in a centralized setting. • Bite the performance bullet and send the whole list? Failure Handling in a Modal Language ISR
Example • Remotely Finding List’s Sum (ML5) Before: fun foo remote_host remote_list_ref = let fun sum a_list = foldl op+ 0 a_list in if sum ( get[remote_host]( !remote_list_ref ) ) > 40 then true else false Failure Handling in a Modal Language ISR
Example • Remotely Finding List’s Sum (ML5) After: fun foo remote_host remote_list_ref = let fun sum a_list = foldl op+ 0 a_list in get[remote_host]( if sum ( !remote_list_ref ) > 40 then true else false ) Failure Handling in a Modal Language ISR
Types • ML5 Type System Embeds a Notion of Place • Some values can be used at any place. • e.g. Primitive data types, structures • Some values can only be used at the location where they make sense. • e.g. File descriptors, reference cells, printers Failure Handling in a Modal Language ISR
Just a Few Types… • τ@w – “The type τ is well-typed on host w.” Failure Handling in a Modal Language ISR
Just a Few Types… • get[w’,a]e – “Evaluate e on host w’ and return the result to the current host. Change e’s type from @w’ to @w.” • Example: fun foo (x: int ref @w’, a: w’ addr @w) = get[w’,a]( !x + !x ) Failure Handling in a Modal Language ISR
Just a Few Types… • get[w’,a]e – “Evaluate e on host w’ and return the result to the current host. Change e’s type from @w’ to @w.” • Example: fun foo (x: int ref @w’, a: w’ addr @w) = get[w’,a]( !x + !x ) Typed int@w’ Failure Handling in a Modal Language ISR
Just a Few Types… • get[w’,a]e – “Evaluate e on host w’ and return the result to the current host. Change e’s type from @w’ to @w.” • Example: fun foo (x: int ref @w’, a: w’ addr @w) = get[w’,a]( !x + !x ) Typed int@w Failure Handling in a Modal Language ISR
Just a Few Types… • □τ – “Suspended code that can be evaluated anywhere. Produces a value of type τ.” • Example: (let fun sum il = foldl op+ 0 il in box (sum [1,2,3,4,5]) end): □int @w Failure Handling in a Modal Language ISR
Just a Few Types… • ◊τ – “A value of type τ that exists at some other location.” • Example: here (ref 5):◊(ref int) @w Failure Handling in a Modal Language ISR
But What About Host Failure? • What happens here? (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_w_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_w_3)) Failure Handling in a Modal Language ISR
But What About Host Failure? • What happens here? (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_w_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_w_3)) Host 2 dies! Failure Handling in a Modal Language ISR
But What About Host Failure? • What happens here? (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_w_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_w_3)) Throw an exception? Host 2 dies! Failure Handling in a Modal Language ISR
But What About Host Failure? • What happens here? (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_w_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_w_3)) Continue on from Host 3? Throw an exception? Host 2 dies! Failure Handling in a Modal Language ISR
But What About Host Failure? • What happens here? (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_w_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_w_3) or_if_i_cant_return (...))) Continue on from Host 3? Throw an exception? Host 2 dies! Failure Handling in a Modal Language ISR
But What About Host Failure? • What happens here? (* at host 1 *) get[w_2, a_2]( (* at host 2 WHICH DOESN’T EXIST!*) !int_ref_at_w_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_w_3) or_if_i_cant_return (...))) Continue on from Host 3? Throw an exception? Host 2 dies! Failure Handling in a Modal Language ISR
What We Want (Intuitively) callcc x => (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_h_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_h_3 or_if_i_cant_return (throw (raise NetFail) to x))) Failure Handling in a Modal Language ISR
What We Want (Intuitively) callcc x => (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_h_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_h_3 or_if_i_cant_return (throw (raise NetFail) to x))) Don’t actually throw something through the network. Failure Handling in a Modal Language ISR
What We Want (Intuitively) callcc x => (* at host 1 *) get[w_2, a_2]( (* at host 2 *) !int_ref_at_h_2 + get[w_3, a_3]( (* at host 3 *) !int_ref_at_h_3 or_if_i_cant_return (throw (raise NetFail) to x))) Have host one detect the failure. Don’t actually throw something through the network. Failure Handling in a Modal Language ISR
Isn’t This Just a ‘Timeout’ Exception? • A Good Question: • “Why not just have the ‘get’ operation throw a timeout exception, like in Java?” • e.g. get[w_2, a_2] ( !int_on_w2 ) handle TimeOut => (* do something *) Failure Handling in a Modal Language ISR
Answers • This is actually a little smarter than just ‘timeout.’ • The ‘Implicit Spawn’ Problem Failure Handling in a Modal Language ISR
Answers • This is actually a little smarter than just ‘timeout.’ • The ‘Implicit Spawn’ Problem get[w_2, a_2] ( (* extremely complicated op *) ) handle TimeOut => (* do something *) Failure Handling in a Modal Language ISR
Answers • This is actually a little smarter than just ‘timeout.’ • The ‘Implicit Spawn’ Problem T2 get[w_2, a_2] ( (* extremely complicated op *) ) handle TimeOut => (* do something *) T1 Failure Handling in a Modal Language ISR
What We Need • Share the Fact that Host 1 Has ‘Given Up’ • Kill the Thread ASAP • Make That Thread’s Actions Irrelevant • Each host gets a chance to ‘undo’ potential effects. • All with ‘Best Effort’ Failure Handling in a Modal Language ISR
One More Wrinkle Grab ‘continuation’ Catom 1 Catom 2 Failure Handling in a Modal Language ISR
One More Wrinkle Assign ‘Catom1’ to ‘myLeader’ Catom 1 Catom 2 Failure Handling in a Modal Language ISR