1 / 31

April 29 th , 2013 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs194-24

CS194-24 Advanced Operating Systems Structures and Implementation Lecture 23 Application-Specific File Systems Deep Archival Storage Security and Protection. April 29 th , 2013 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs194-24. Goals for Today.

cedric
Download Presentation

April 29 th , 2013 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs194-24

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS194-24Advanced Operating Systems Structures and Implementation Lecture 23Application-Specific File SystemsDeep Archival StorageSecurity and Protection April 29th, 2013 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs194-24

  2. Goals for Today • Application-specific File Systems • Dynamo, Haystack • Deep Archival Storage • OceanStore • Security and Protection Interactive is important! Ask Questions! Note: Some slides and/or pictures in the following are adapted from Bovet, “Understanding the Linux Kernel”, 3rd edition, 2005

  3. Recall: VFS Common File Model • Four primary object types for VFS: • superblock object: represents a specific mounted filesystem • inode object: represents a specific file • dentry object: represents a directory entry • file object: represents open file associated with process • There is no specific directory object (VFS treats directories as files) • May need to fit the model by faking it • Example: make it look like directories are files • Example: make it look like have inodes, superblocks, etc.

  4. Recall: Data-based Caching (Data “De-Duplication”) • Use a sliding-window hash function to break files into chunks • Rabin Fingerprint: randomized function of data window • Pick sensitivity: e.g. 48 bytes at a time, lower 13 bits = 0  2-13 probability of happening, expected chunk size 8192 • Need minimum and maximum chunk sizes • Now – if data stays same, chunk stays the same • Blocks named by cryptographic hashes such as SHA-256

  5. Recall: Peer-to-Peer: Fully equivalent components • Peer-to-Peer has many interacting components • View system as a set of equivalent nodes • “All nodes are created equal” • Any structure on system must be self-organizing • Not based on physical characteristics, location, or ownership

  6. Source 111… 0… 110… Response 10… Lookup ID Recall: Lookup with Leaf Set (Chord) • Assign IDs to nodes • Map hash values to node with closest ID • Leaf set is successors and predecessors • All that’s needed for correctness • Routing table matches successively longer prefixes • Allows efficient lookups • Data Replication: • On leaf set

  7. Advantages/Disadvantages of Consistent Hashing • Advantages: • Automatically adapts data partitioning as node membership changes • Node given random key value automatically “knows” how to participate in routing and data management • Random key assignment gives approximation to load balance • Disadvantages • Uneven distribution of key storage natural consequence of random node names  Leads to uneven query load • Key management can be expensive when nodes transiently fail • Assuming that we immediately respond to node failure, must transfer state to new node set • Then when node returns, must transfer state back • Can be a significant cost if transient failure common • Disadvantages of “Scalable” routing algorithms • More than one hop to find data  O(log N) or worse • Number of hops unpredictable and almost always > 1 • Node failure, randomness, etc

  8. Dynamo Assumptions • Query Model – Simple interface exposed to application level • Get(), Put() • No Delete() • No transactions, no complex queries • Atomicity, Consistency, Isolation, Durability • Operations either succeed or fail, no middle ground • System will be eventually consistent, no sacrifice of availability to assure consistency • Conflicts can occur while updates propagate through system • System can still function while entire sections of network are down • Efficiency – Measure system by the 99.9th percentile • Important with millions of users, 0.1% can be in the 10,000s • Non Hostile Environment • No need to authenticate query, no malicious queries • Behind web services, not in front of them

  9. Service Level Agreements (SLA) • Application can deliver its functionality in a bounded time: • Every dependency in the platform needs to deliver its functionality with even tighter bounds. • Example: service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second • Contrast to services which focus on mean response time Service-oriented architecture of Amazon’s platform

  10. Replication • Each data item is replicated at N hosts • “preference list”: The list of nodes responsible for storing a particular key • Successive nodes not guaranteed to be on different physical nodes • Thus preference list includes physically distinct nodes • Sloppy Quorum • R (or W) is the minimum number of nodes that must participate in a successful read (or write) operation. • Setting R + W > N yields a quorum-like system. • Latency of a get (or put) is dictated by the slowest of the R (or W) replicas. For this reason, R and W are usually configured to be less than N, to provide better latency. • Replicas synchronized via anti-entropy protocol • Use of Merkle tree for each unique range • Nodes exchange root of trees for shared key range

  11. Administrivia • Get moving on Lab 4 • Will require you to read a bunch of code to digest the VFS layer • Design due this Thursday! • So that Palmer can have design reviews on Friday • Focus on behavioral aspects • Mounting, File operations, Etc • Don’t forget final Lecture during RRR • Monday 5/6 • Send me final topics

  12. Data Versioning • A put() call may return to its caller before the update has been applied at all the replicas • A get() call may return many versions of the same object. • Challenge: an object having distinct version sub-histories, which the system will need to reconcile in the future. • Solution: uses vector clocks in order to capture causality between different versions of the same object • A vector clock is a list of (node, counter) pairs • Every version of every object is associated with one vector clock • If the counters on the first object’s clock are less-than-or-equal to all of the nodes in the second clock, then the first is an ancestor of the second and can be forgotten.

  13. Vector clock example

  14. Conflicts (multiversion data) • Client must resolve conflicts • Only resolve conflicts on reads • Different resolution options: • Use vector clocks to decide based on history • Use timestamps to pick latest version • Examples given in paper: • For shopping cart, simply merge different versions • For customer’s session information, use latest version • Stale versions returned on reads are updated (“read repair”) • Vary N, R, W to match requirements of applications • High performance reads: R=1, W=N • Fast writes with possible inconsistency: W=1 • Common configuration: N=3, R=2, W=2 • When do branches occur? • Branches uncommon: 0.06% of requests saw > 1 version over 24 hours • Divergence occurs because of high write rate (more coordinators), not necessarily because of failure

  15. Haystack File System • Does it ever make sense to adapt a file system to a particular usage pattern? • Perhaps • Good example: Facebook’s “Haystack” filesystem • Specific application (Photo Sharing) • Large files!, Many files! • 260 Billion images, 20 PetaBytes (1015 bytes!) • One billion new photos a week (60 TeraBytes) • Presence of Content Delivery Network (CDN) • Distributed caching and distribution network • Facebook web serversreturn special URLs that encode requests to CDN • Pay for service by bandwidth • Specific usage patterns: • New photos accessed a lot (caching well) • Old photos accessed little, but likely to be requested at any time  NEEDLES Number of photosrequested in day

  16. Old Solution: NFS • Issues with this design? • Long Tail  Caching does notwork for most photos • Every access to back end storagemust be fast without benefit ofcaching! • Linear Directory scheme worksbadly for many photos/directory • Many disk operations to find even a single photo • Directory’s block map too big to cache in memory • “Fixed” by reducing directory size, however still not great • Meta-Data (FFS) requires ≥ 3 disk accesses per lookup • Caching all iNodes in memory might help, but iNodes are big • Fundamentally, Photo Storage different from other storage: • Normal file systems fine for developers, databases, etc

  17. New Solution: Haystack • Finding a needle (old photo) in Haystack • Differentiate between oldand new photos • How? By looking at “Writeable”vs “Read-only” volumes • New Photos go to Writeable volumes • Directory: Help locate photos • Name (URL) of photo has embedded volume and photo ID • Let CDN or Haystack CacheServe new photos • rather than forwarding them to Writeable volumes • Haystack Store: Multiple “Physical Volumes” • Physical volume is large file (100 GB) which stores millions of photos • Data Accessed by Volume ID with offset into file • Since Physical Volumes are large files, use XFS which is optimized for large files

  18. Haystack Details • Each physical volume is stored as single file in XFS • Superblock: General information about the volume • Each photo (a “needle”) stored by appending to file • Needles stored sequentially in file • Naming: [Volume ID, Key, Alternate Key, Cookie] • Cookie: random value to avoid guessing attacks • Key: Unique 64-bit photo ID • Alternate Key: four different sizes, ‘n’, ‘a’, ‘s’, ‘t’ • Deleted Needle Simply marked as “deleted” • Overwritten Needle – new version appended at end

  19. Haystack Details (Con’t) • Replication for reliability and performance: • Multiple physical volumes combined into logical volume • Factor of 3 • Four different sizes • Thumbnails, Small, Medium, Large • Lookup • User requests Webpage • Webserver returns URL of form: • http://<CDN>/<Cache>/<Machine id>/<Logical volume,photo> • Possibly reference cache only if old image • CDN will strip off CDN reference if missing, forward to cache • Cache will strip off cache reference and forward to Store • In-memory index on Store for each volume map: [Key, Alternate Key]  Offset

  20. What about Protection? • Start by asking some high-level questions… • What do we expect of our systems? • Won’t leak our information • Won’t lose our information • Will always work when we need them • Won’t launch attacks against other people • How can we prevent systems from misbehaving? • Never connect them to the network? • Always authenticate users? • Never use them? • Protection:use of one or more mechanisms for controlling the access of programs, processes, or users to resources • Page Table Mechanism • File Access Mechanism • On-disk encryption • Can use lots of Protection but still have an insecure system! • Bugs, back doors, viruses, poorly defined policy, inside man • Denial of service, …

  21. Protection vs Security • Security is a very complex topic: see, i.e. CS161 • Security is about Policy, i.e. what human-centered properties do we want from our system • Usually with reference to an attack model • Security is achieved through a series of Mechanisms, i.e. individual elements of the system combined together to achieve a security policy • Security: use of protection mechanisms to prevent misuse of resources • Misuse defined with respect to policy • E.g.: prevent exposure of certain sensitive information • E.g.: prevent unauthorized modification/deletion of data • Requires consideration of the external environment within which the system operates • Most well-constructed system cannot protect information if user accidentally reveals password

  22. Preventing Misuse • Types of Misuse: • Accidental: • If I delete shell, can’t log in to fix it! • Could make it more difficult by asking: “do you really want to delete the shell?” • Intentional: • Some high school brat who can’t get a date, so instead he transfers $3 billion from B to A. • Doesn’t help to ask if they want to do it (of course!) • Three Pieces to Security • Authentication: who the user actually is • Authorization: who is allowed to do what • Enforcement: make sure people do only what they are supposed to do • Loopholes in any carefully constructed system: • Log in as superuser and you’ve circumvented authentication • Log in as self and can do anything with your resources; for instance: run program that erases all of your files • Can you trust software to correctly enforce Authentication and Authorization?????

  23. Authentication: Identifying Users • How to identify users to the system? • Passwords • Shared secret between two parties • Since only user knows password, someone types correct password  must be user typing it • Very common technique • Smart Cards • Electronics embedded in card capable of providing long passwords or satisfying challenge  response queries • May have display to allow reading of password • Or can be plugged in directly; several credit cards now in this category • Biometrics • Use of one or more intrinsic physical or behavioral traits to identify someone • Examples: fingerprint reader, palm reader, retinal scan • Becoming quite a bit more common • What else? • Consider the “Swarm” and “Un-pad” views

  24. Timing Attacks: Tenex Password Checking • Tenex – early 70’s, BBN • Most popular system at universities before UNIX • Thought to be very secure, gave “red team” all the source code and documentation (want code to be publicly available, as in UNIX) • In 48 hours, they figured out how to get every password in the system • Here’s the code for the password check: for (i = 0; i < 8; i++) if (userPasswd[i] != realPasswd[i]) go to error • How many combinations of passwords? • 2568? • Wrong!

  25. Defeating Password Checking • Tenex used VM, and it interacts badly with the above code • Key idea: force page faults at inopportune times to break passwords quickly • Arrange 1st char in string to be last char in pg, rest on next pg • Then arrange for pg with 1st char to be in memory, and rest to be on disk (e.g., ref lots of other pgs, then ref 1st page) a|aaaaaa | page in memory| page on disk • Time password check to determine if first character is correct! • If fast, 1st char is wrong • If slow, 1st char is right, pg fault, one of the others wrong • So try all first characters, until one is slow • Repeat with first two characters in memory, rest on disk • Only 256 * 8 attempts to crack passwords • Fix is easy, don’t stop until you look at all the characters

  26. Recall: Authorization: Who Can Do What? • How do we decide who is authorizedto do actions in the system? • Access Control Matrix: containsall permissions in the system • Resources across top • Files, Devices, etc… • Domains in columns • A domain might be a user or a group of permissions • E.g. above: User D3 can read F2 or execute F3 • In practice, table would be huge and sparse! • Two approaches to implementation • Access Control Lists: store permissions with each object • Still might be lots of users! • UNIX limits each file to: r,w,x for owner, group, world • More recent systems allow definition of groups of users and permissions for each group • Capability List: each process tracks objects has permission to touch • Popular in the past, idea out of favor today • Consider page table: Each process has list of pages it has access to, not each page has list of processes …

  27. Authorization Continued • Principle of least privilege: programs, users, and systems should get only enough privileges to perform their tasks • Very hard to do in practice • How do you figure out what the minimum set of privileges is needed to run your programs? • People often run at higher privilege then necessary • Such as the “administrator” privilege under windows • One solution: Signed Software • Only use software from sources that you trust, thereby dealing with the problem by means of authentication • Fine for big, established firms such as Microsoft, since they can make their signing keys well known and people trust them • Actually, not always fine: recently, one of Microsoft’s signing keys was compromised, leading to malicious software that looked valid • What about new startups? • Who “validates” them? • How easy is it to fool them?

  28. Mandatory Access Control (MAC) • Mandatory Access Control (MAC) • “A Type of Access control by which the operating system constraints the ability of a subject or initiator to access or generally perform some sort of operation on an object or target.” From Wikipedia • Subject: a process or thread • Object: files, directories, TCP/UDP ports, etc • Security policy is centrally controlled by a security policy administrator: users not allowed to operate outside the policy • Examples: SELinux, HiStar, etc. • Contrast: Discretionary Access Control (DAC) • Access restricted based on the identity of subjects and/or groups to which they blong • Controls are discretionary – a subject with a certain access permission is capable of passing that permission on to any other subject • Standard UNIX model

  29. Data Centric Access Control (DCAC?) • Problem with many current models: • If you break into OS  data is compromised • In reality, it is the data that matters – hardware is somewhat irrelevant (and ubiquitous) • Data-Centric Access Control (DCAC) • I just made this term up, but you get the idea • Protect data at all costs, assume that software might be compromised • Requires encryption and sandboxing techniques • If hardware (or virtual machine) has the right cryptographic keys, then data is released • All of the previous authorization and enforcement mechanisms reduce to key distribution and protection • Never let decrypted data or keys outside sandbox • Examples: Use of TPM, virtual machine mechanisms

  30. Enforcement • Enforcer checks passwords, ACLs, etc • Makes sure the only authorized actions take place • Bugs in enforcerthings for malicious users to exploit • Normally, in UNIX, superuser can do anything • Because of coarse-grained access control, lots of stuff has to run as superuser in order to work • If there is a bug in any one of these programs, you lose! • Paradox • Bullet-proof enforcer • Only known way is to make enforcer as small as possible • Easier to make correct, but simple-minded protection model • Fancy protection • Tries to adhere to principle of least privilege • Really hard to get right • Same argument for Java or C++: What do you make private vs public? • Hard to make sure that code is usable but only necessary modules are public • Pick something in middle? Get bugs and weak protection!

  31. Summary • Peer-to-Peer: • Use of 100s or 1000s of nodes to keep higher performance or greater availability • May need to relax consistency for better performance • Application-Specific File Systems (e.g. Haystack): • Optimize system for particular usage pattern • Security: use of protection mechanisms to prevent misuse of resources • Represents Human-Centered Policy as opposed to mechanism • Three Pieces to Security • Authentication: who the user actually is • Authorization: who is allowed to do what • Enforcement: make sure people do only what they are supposed to do • Principle of least privilege: programs, users, and systems should get only enough privileges to perform their tasks

More Related