430 likes | 600 Views
Making Cloud Storage Provenance-Aware. Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer. Harvard School of Engineering and Applied Sciences. The Cloud. Next generation computing environment Cheap: Pay as you go Provision resources (storage, CPU) on a need basis
E N D
Making Cloud Storage Provenance-Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences
The Cloud • Next generation computing environment • Cheap: Pay as you go • Provision resources (storage, CPU) on a need basis • Provides illusion of infinite resources • Companies with large batch oriented tasks can get results quickly • Cloud providers • Amazon Web Services (AWS) • Google AppEngine • Microsoft Azure Making a cloud Provenance-Aware - TaPP'09
Provenance for the Cloud • As apps move to the cloud, so will the data • Amazon hosts scientific data for free • However, most cloud services are not designed to store provenance • Why Provenance? • Debug Application Results • Validate Data Sets • Improve Search Results • Regulatory Compliance Making a cloud Provenance-Aware - TaPP'09
Provenance Properties • We identified the following properties • Read Correctness • Causal Ancestry Ordering • Queryable Making a cloud Provenance-Aware - TaPP'09
Read Correctness • Data must be what is described by provenance • Provenance accurately describes the data object • Mechanisms • Atomicity: At storage time, both provenance and data should be stored or neither should be stored • Consistency: At retrieval time, data returned should be consistent with provenance Making a cloud Provenance-Aware - TaPP'09
Causal Ancestry Ordering • The provenance and data of an ancestor object must be recorded in the provenance system • No dangling references Making a cloud Provenance-Aware - TaPP'09
Efficient Query • Provenance must be accessible to users who want to verify properties of their data or simply be aware of its lineage • If provenance is not readily accessible, the provenance is of questionable value. Making a cloud Provenance-Aware - TaPP'09
Goal • How do we design protocols around current cloud services such that these properties are satisfied? • Setting • Provenance-Aware Storage system (PASS) tracks and collects provenance • Primarily considered AWS • Used 3 services: S3, SimpleDB, SQS Making a cloud Provenance-Aware - TaPP'09
Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09
Provenance-Aware Storage System • Observes system calls that applications make and captures relationships between objects • P: read A • Generates record: P depends on A • Cache the record • P: write B • Generates record: B depends on P • Store both ‘B depends on P’ and ‘P depends on A’ • Mirrors data locally and caches provenance till we need to send it to AWS Making a cloud Provenance-Aware - TaPP'09
Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09
Simple Storage Service (S3) • Object Store: sizes from 1byte to 5GB • Object’s identified by URI • SOAP or REST interface • Operations: • PUT, GET, HEAD, COPY, DELETE • PUT: store an object and its metadata (2KB limit) • HEAD: retrieves metadata of an object • Cost: data storage + bandwidth + num ops • Eventual consistency Making a cloud Provenance-Aware - TaPP'09
Application PASS S3 Architecture 1: Standalone S3 User System Prov+Data Making a cloud Provenance-Aware - TaPP'09
PUT:(Prov >1KB) OK S3 Protocol 1: Standalone S3 PASS Making a cloud Provenance-Aware - TaPP'09
PUT:Data OK S3 Protocol 1: Standalone S3 PASS Making a cloud Provenance-Aware - TaPP'09
Properties Making a cloud Provenance-Aware - TaPP'09
Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09
SimpleDB • Service providing database functionality • Data model: items described by attribute-value pairs • 256 attrs maximum, name/value < 1KB • Operations: PutAttributes, Query, QueryWithAttributes, and SELECT • Query returns items • QueryWithAttributes returns both items and attributes • Cost: bandwidth + storage + num ops + machine hrs Making a cloud Provenance-Aware - TaPP'09
Application PASS Architecture 2: S3 + SimpleDB User System Data Prov SimpleDB S3 Making a cloud Provenance-Aware - TaPP'09
PUT:(rec > 1KB) OK PutAttrs+ OK SimpleDB S3 Protocol 2: S3 + SimpleDB PASS Making a cloud Provenance-Aware - TaPP'09
PUT:Data OK SimpleDB S3 Protocol 2: S3 + SimpleDB PASS Making a cloud Provenance-Aware - TaPP'09
Properties Making a cloud Provenance-Aware - TaPP'09
Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09
Simple Queuing Service (SQS) • Distributed Messaging System • Queues are identified by URL • Operations: SendMessage, ReceiveMessage, DeleteMessage • VisibilityTimeout: • Message will not be available for x seconds after a ReceiveMessage • Limits: 8KB message size, max 10 msgs can be received Making a cloud Provenance-Aware - TaPP'09
Application PASS Architecture 3: S3 + SimpleDB + SQS User System Queue1 Prov Data SimpleDB S3 Making a cloud Provenance-Aware - TaPP'09
PUT: Temp copy OK COPY OK SndMsg+ OK PutAttrs+ OK RecvMsg+ SimpleDB S3 Protocol 3: S3 + SimpleDB + SQS PASS Commitd SQS Making a cloud Provenance-Aware - TaPP'09
DEL:CPY OK DelMsg+ SimpleDB S3 Protocol 3: S3 + SimpleDB + SQS PASS Commitd SQS Making a cloud Provenance-Aware - TaPP'09
Idempotency • SimpleDB, S3, and SQS are idempotent • If a commit daemon crashes, comes back up and processes a transaction again, there will not be errors Making a cloud Provenance-Aware - TaPP'09
Properties Making a cloud Provenance-Aware - TaPP'09
Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09
Analysis • Extracted provenance by running three workloads • Linux compile • Blast • Provenance challenge • Compute cost to store and query provenance • Number of ops • Bandwidth Making a cloud Provenance-Aware - TaPP'09
Storage Cost P1 = S3 P2 = S3 + SimpleDB P3 = S3 + SimpleDB + SQS Making a cloud Provenance-Aware - TaPP'09
Query Cost • Dump the provenance of a given object • Ran it on all objects for statistical significance • Find all the files that were outputs of blast. • Find all the descendants of files derived from blast. Making a cloud Provenance-Aware - TaPP'09
Query results Making a cloud Provenance-Aware - TaPP'09
Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09
Conclusions • Identified the properties that need to be satisfied for storing provenance in the cloud • Presented various protocols for storing provenance and data on the cloud • Costs of storing provenance is reasonable Making a cloud Provenance-Aware - TaPP'09
Status • System almost ready • Plan to submit it to Symposium on Operating Systems Principles (SOSP) • Really hard to drive up the cost • Jan Bill = $1.95 • Feb Bill = $9.38 Making a cloud Provenance-Aware - TaPP'09
Extra Making a cloud Provenance-Aware - TaPP'09
Protocol 1: Standalone S3 On file close: • Convert the provenance into attribute-value pairs as required by S3 • If (sizeof(record) > 1KB) • Store the record in a separate S3 object • Replace attribute-value pair with pointer to this object • Upload the file using PUT: • Arguments: object, attributes Making a cloud Provenance-Aware - TaPP'09
Protocol 2: S3 + SimpleDB On file close: • Convert the provenance into attribute-value pairs as required by SimpleDB • Additonal record: md5sum of (file contents + version) • If (sizeof(record) > 1KB) • Store the record in a separate S3 object • Replace attribute-value pair with pointer to this object • Issue PutAttributes: store the provenance • One item per version (= One PutAttributes) per version of the object • Upload the file to S3 using PUT Making a cloud Provenance-Aware - TaPP'09
Protocol 3: S3 + SimpleDB + SQS (I) • Log phase: Log data on a queue • Store a copy of the file in a temporary location on S3 • Allocate a transaction id (uuid) • Split provenance into chunks of 8KB and enqueue them on an SQS queue • Tag each message with the transaction ID • One additional record that has a pointer to the temp S3 object Making a cloud Provenance-Aware - TaPP'09
Protocol 3: S3 + SimpleDB + SQS (II) • Commit phase: move data from SQS to S3 and SimpleDB • ReceiveMessage: get messages from the queue and assemble the packets • Store the provenance in SimpleDB using PutAttributes call • Take care of overflows • Execute an S3 COPY and copy the object from its temporary location to permanent • Delete Messages from SQS • Delete temporary file copy Making a cloud Provenance-Aware - TaPP'09