1 / 80

Recent Advances in System Software Dependability

Recent Advances in System Software Dependability. 周枫 网易公司 2007-12-21 清华大学 http://zhoufeng.net. 关于我. 周枫 1996-2002 ,清华,计算机本科、硕士 2002-2007 , UC Berkeley, Ph.D. in CS 2007-, 网易公司,高级副总裁 感兴趣的研究方向: OS, Internet Services, Programming Languages, Networking, Information Retrieval.

pekelo
Download Presentation

Recent Advances in System Software Dependability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent Advances in System Software Dependability 周枫 网易公司 2007-12-21 清华大学 http://zhoufeng.net

  2. 关于我 • 周枫 • 1996-2002,清华,计算机本科、硕士 • 2002-2007,UC Berkeley, Ph.D. in CS • 2007-, 网易公司,高级副总裁 • 感兴趣的研究方向:OS, Internet Services, Programming Languages, Networking, Information Retrieval

  3. A Trend in Software “The Movement to Safe Languages” • Type-safety  No memory safety errors (segfaults) Java, C#, Python, Ruby, PHP, Perl… Vs. C, C++, ASM

  4. Why Safe Languages? Pros • Easier to program • Less error-prone • Easier to analyze • Cons • Slower • Less control (over memory, I/O …) • Bad for real-time things More dependable software with less effort

  5. Status Quo • Server: ~80% • All server-side Java, Ruby-on-Rails… • Client: ~30% • Visual Studio, Eclipse… • What about system software? Almost 0 Why? See “cons” on previous slide: slower, less control, bad realtime perf.

  6. The Problem • System software: mainly operating systems • Dependability: “properties that allow one to rely on a system” • Reliability, Security, Safety, Availability How do we make more dependable system software?

  7. Current State • Problems dealing with latest or peculiar hardware • Worms because of so many security flaws • More problems for long running servers or large clusters [Bligh et al. 07]

  8. Problem scope Our Focus

  9. Why is this important? • The cost of defects is increasing • The society is increasingly driven by computers • More and more computers online  remote exploits • Critical infrastructure using commodity systems – “OS monoculture”

  10. Why is this hard? • Reason 1: Complexity • Windows Vista (2006): 50M LOCWindows NT 3.1 (1993): 6M • Linux kernel: 8.3M LOC 86 lines/hr for last 2 years

  11. More reasons • Reason 2: Unsafe languages used (C/C++) • Buffer overruns can be eliminated with safe languages • Reason 3: Recent trends • Multicore  more parallelism • “The Free Lunch is Over”, Herb Sutter, 2005 • ccNUMA, smarter devices • ACPI byte language

  12. Roadmap • OS Dependability with Hardware Protection • Swift et al. 03-04 • OS Dependability with Program Analysis • Zhou et al. 04-06 • OS Dependability with Virtual Machines • Criswell et al. 07

  13. OS Dependability With Hardware

  14. What Causes Most Crashes? • Device drivers! • Run in the same protection domain • Drivers are often buggier than the kernel • Device drivers cause 85% of Windows XP crashes • Drivers are 7 times buggier than the kernel in Linux • Xbox hacked due to memory bugs in games Better driver dependability  Better OS dependability

  15. Crashes * Figure courtesy of Mike Swift et al.

  16. Crashes * Figure courtesy of Mike Swift et al.

  17. Goal * Figure courtesy of Mike Swift et al.

  18. Requirements • Isolation • Recovery • Non-intrusive • No/very few code changes

  19. Principles & Goal • Principles & Assumptions • Drivers are generally well-behaved and benign • Design for mistakes (not abuse) • Doesn’t need to be perfect • Design for fault resistance (not fault tolerance) • Goal: a practical “best-effort” system

  20. Nooks • Linux 2.4 kernel and drivers • “Nooks” kernel patch • Isolation • Recovery • Compatible with existing code

  21. Isolation - Memory * Figure courtesy of Mike Swift et al.

  22. Isolation – Control Transfer * Figure courtesy of Mike Swift et al.

  23. Isolation – Control Transfer * Figure courtesy of Mike Swift et al.

  24. Isolation – Data Access * Figure courtesy of Mike Swift et al.

  25. Isolation – Data Access * Figure courtesy of Mike Swift et al.

  26. Isolation – Interposition * Figure courtesy of Mike Swift et al.

  27. Isolation – Interposition * Figure courtesy of Mike Swift et al.

  28. Isolation Summary • Isolation • Lightweight Kernel Protection Domain • eXtension Procedure Call (XPC) • Copy-in/Copy-out • Wrappers

  29. Failure Detection * Figure courtesy of Mike Swift et al.

  30. Failure Detection * Figure courtesy of Mike Swift et al.

  31. Failure Detection * Figure courtesy of Mike Swift et al.

  32. Restart * Figure courtesy of Mike Swift et al.

  33. Restart * Figure courtesy of Mike Swift et al.

  34. Restart * Figure courtesy of Mike Swift et al.

  35. Driver State/Session Recovery • Drivers lose state after restart • E.g. file handles + history of ioctls configuring the drivers • This causes apps to fail • Shadow drivers • Kernel agents for recovering drivers • Observe kernel-driver communication normally • Restores drivers state after restart

  36. Native Linux * Figure courtesy of Mike Swift et al.

  37. Normal Behavior * Figure courtesy of Mike Swift et al.

  38. During Recovery * Figure courtesy of Mike Swift et al.

  39. Evaluation • Pros • General solution • Covers both isolation and recovery • Good availability • Low overhead when accessing memory • Cons • High overhead when crossing domains • System specific • Coarse grain protection. Does not prevent driver from corrupting itself

  40. Results * Figure courtesy of Mike Swift et al.

  41. More Results • Nooks: 23,000 LOC • Shadow Manager: 600 LOC • Overhead: up to 100% • Main overhead is domain crossing

  42. Nooks Recap • Device drivers are a major source of OS crashes. • Nooks isolates drivers by putting them inside separate hardware protection domains • Recovery is done by restarting drivers • Driver state can be restored by the “shadow driver” technique • Performance is O.K. Code changes are reasonable.

  43. OS Dependability WithProgram Analysis

  44. Review • Separate hardware protection domains: Nooks [Swift et al], L4 [LeVasseur et al], Xen [Fraser et al] • Relatively high overhead due to cross-domain calls, system specific • Binary instrumentation: SFI [Wahbe et al, Small/Seltzer] • High overhead, coarse-grained • What can be done at the C language level? • Add fined-grained type-safety, to extensions only • A way to recover from failures

  45. Vision • What a safe language provides: • Array indexing stays within object bounds • No uses of null/invalid pointers • All operations are type safe • No uses of dangling pointers • Control flow obeys program semantics • …

  46. A Language-Based Approach to Extension Safety • Light annotations in extension code and host API • Buffer bounds, non-null pointers, nullterm strings, tagged unions • Deputy src-to-src compiler emits safety checks when necessary • Key: compatible extension-host binary interface • Runtime tracks resource usage and restores system invariants at fail time Annot.Source Deputy C w/ checks GCC DriverModule SafeDrive Runtime & Recovery Linux Kernel Kernel Address Space

  47. Deputy: Motivation • Common C code • How to check memory safety? • C pointers do not express extent of buffers (unlike Java) struct { unsigned int len; int * data; } x; for (i=0;i<x.len;i++) { … x.data[i] … }

  48. Previous Approach: Fat Pointers • Used in CCured and Cyclone • Compiler inserts extra bounds variables • Changes memory layout • Cannot be applied modularly struct { unsigned int len; int * data; int * data_b; int * data_e; } x; for (i = 0; i < x.len; i++) { if (x.data+i<x.data_b) abort(); if (x.data+i>=x.data_e) abort(); … x.data[i] … }

  49. Deputy Bounds Annotations struct { unsigned int len; int * count(len) data; } x; for(i = 0; i < x.len; i++) { if (i<0||i>=x.len) abort(); … x.data[i] … } • Annotations use existing bounds info in programs, or constants • Compiler emits runtime checks • No memory layout change Can be applied to one extension a time • Many checks can be optimized away

  50. Deputy Features • Bounds: safe,count(n), bound(lo,hi) • Default: safe • Other annotations • Null terminated string/buffer • Tagged unions • Open arrays • Checks for printf() arguments • Automatic bounds variables for local variables reduced annotation burden

More Related