Efficient Software-Based Fault Isolation Techniques for System Reliability

Efficient software-based fault isolation Robert Wahbe, Steven Lucco, Thomas Anderson & Susan Graham Presented by: Stelian Coros

Motivation • Extensibility is desirable • Integration of independently developed software modules • Faults in extension code  entire software system is unreliable • Bugs in third party code • Need to protect distrusted modules from corrupting application data • Allow cooperation

First Solution • Hardware solution: • Place each module in its own address space • Automatic boundary protection • Use cross-address-space RPC for cooperation between modules – lots of context switches • HIGH PERFORMANCE COST • Maybe a software solution is better

Insight!!! • Only a relatively small portion of code is distrusted • Should only protect against distrusted code!!

0x3AB00000 . . . 0x3ABFFFFF 0x3AC00000 . . . 0x3ACFFFFF Second Solution Virtual Address Space • Software approach: • Handle fault isolation within one address space • Segment an address space into fault domains • One segment for code, and one for data • Segment identifiers • One fault domain for each task • Software Encapsulation – distrusted object code must: • Jump only to targets in its code segment • Write only to addresses within its data segment

Target Address Segment Matching == • Handling unsafe instructions: • Jump/store to an address that cannot be statically verified to be in the correct segment • insert checking code before unsafe instructions • four instructions and four dedicated registers (used only by inserted code) • If check fails, trap to system error routine • pinpoint offending instruction Segment ID

Target Address Address Sandboxing • Reduce overhead further • Insert code to overwrite the upper bits (segment identifier) of target address • Requires only two instructions • Does not catch illegal addresses • Verifiable • Needs 5 registers Segment ID

Process Resources • Same virtual address space • Resources allocated per-address-space basis can still be corrupted (e.g. close files) • Solution: distrusted code cannot make system calls directly • cross-fault-domain RPC • trusted arbitration code handles system calls on behalf of the distrusted code • arbitration code can make system calls directly, if the operations are deemed safe

0x3AB00000 . . . 0x3ABFFFFF 0x3AC00000 . . . 0x3ACFFFFF Data Sharing Virtual Address Space • Read sharing  easy • Write sharing  lazy pointer swizzling • map shared memory region into all segments that need access – at the same offset in each segment 0x3AB12ABC 0x3AC12ABC

Implementation details • Software Encapsulation: • could use a compiler to generate encapsulated code for distrusted modules • the system can directly modify object code at load time • Binary patching – not possible at the time

Fast Cross-Fault-Domain RPC • Jump table – control transfer • only way to escape a fault domain • only modifiable by trusted segment • Customized call and return stubs • copy cross-domain arguments • manage machine state

Fast Cross-Fault-Domain RPC • Robustness • uses the UNIX signal facilities to catch errors • notifies the caller’s fault domain • trusted modules can use timers to interrupt execution and determine line of action

Performance • Software encapsulation provides substantial savings over using native OS services

Performance • Much cheaper than traditional context switches

Summary • Software fault isolation • Constrain jumps and writes to be within fault domain • Restrict direct access to system calls • Cross-fault-domain RPC == jump instructions • Need extra instructions and registers, but there is an overall performance improvement

Discussion • I think limitations of binary patching could be resolved by adding hardware support in terms of new dedicated registers for that purpose (of course this could be done in away to keep the architecture backward compatible). Haven’t that been tried out or got implemented in any systems? Are these fault isolation techniques used in current systems in the first place?

Discussion • What determines the partitioning of the target address to yield the ideal segment identifier size? What impacts/influences this decision? • Is it difficult to find a contiguous region of memory to be used as afault domain?

Discussion • Why would programs with "a significant percentage of floating pointoperations" or which "perform significant amounts of I/O" incur lessoverhead?

Discussion • If this method is so effective, why is it not used in place ofmicrokernels? How might this compare to lightweight microkernelapproaches such as L3/L4 or LRPC?

Discussion • What criteria would one use to determine if a software package needssandboxing or not? The author keeps making reference to "distrustedmodules", but why would you be running modules you distrust / the reverse,can you trust anything?

Efficient Software-Based Fault Isolation Techniques for System Reliability