270 likes | 425 Views
ICS 214B: Transaction Processing and Distributed Data Management. Lecture 17: Providing Database as a Service Professor Chen Li. Based on slides developed by Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra ICDE 2002, San Jose, CA, USA. Talk Outline. Software as a Service
E N D
ICS 214B: Transaction Processing and Distributed Data Management Lecture 17: Providing Database as a Service Professor Chen Li Based on slides developed by Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra ICDE 2002, San Jose, CA, USA
Talk Outline • Software as a Service • Database as a Service • NetDB2 System • Challenges for Database as a Service • User Interface Issues • Performance Issues • Data Privacy Issues • Data Encryption in DBMSs for Data Privacy • Conclusion
Software as a Service • Get … • what you need • when you need • Pay … • what you use • Don’t worry … • how to deploy, implement, maintain, upgrade
Software as a Service • Driving forces to paradigm shift • Faster, cheaper, more accessible networks • Rise of distributed architectures • Virtualization in server and storage technologies • Established e-business infrastructures • Hardware/Software is not the largest in total cost of ownership • User Operations 46% • Technical Support 24% • Capital Cost (HW/SW) 21% (Source: Gartner Group) • Hardware, software, network costs have been decreasing more sharply than personnel cost
Software as a Service • Already in the market as • storage services, disaster recovery services, e-mail services, rent-a-spreadsheet services etc. • Sun ONE, Oracle Online Services, Microsoft .NET My Services etc. Why notDatabase as a Service?
Database as a Service - Why? • Organizations need data management • DBMSs are complex systems to deploy, setup, maintain • requires highly skilled people (DBAs etc.) with high cost
Database as a Service - Offerings • Inherits all advantages of software as a service, plus … • Service provider allows mechanisms to • create, store, access databases • DB management transferred to service provider for • backup, administration, restoration, space management, upgrades • Clients use the services providers HW, SW, personnel instead of their own
NetDB2 - Database Service Provision • Developed in collaboration with University of California, Irvine and IBM • Deployed on the Internet over a year ago • Been used by 15 universities and more than 2500 students to help teaching database classes • Currently offered through IBM Scholars Program
Three tier architecture Client - as thin as possible - just a browser Java based implementation Backed by fail-over solutions Allows expansions and user driven integration for application development Servlet Engine HTTP Server Database (User Data) User (Web Browser) Warm Standby Backup/Recovery Standby System NetDB2 System Architecture
Database as a Service - Issues Issues to address: • User Interface • Performance • Data Privacy
2 1 4 3 User Interface • Simple yet powerful • supports SQL queries, scripts, UDFs, stored procedures, metadata, data upload • Consistent • Region-based composition • Expansion/Integration • User defined interfaces
Performance • Interaction in a different medium - network • Performance should -at least- match what we have already • Experimented with TPC-H database and queries
Data Privacy • Users give control of their data to service provider • Attacks on stored data is a well known problem • So, they need data security in place • Security of data over the network is well studied • SSL,TSL • Establish security for stored data • even it is stolen should not make sense Encryption !
ID NAME DEPTID SALARY 20 John White 2 40000 ID ID ID ID NAME NAME NAME NAME DEPTID DEPTID DEPTID DEPTID SALARY SALARY SALARY SALARY ID NAME DEPTID SALARY %&*((@sFDdw?~$@33<?.%d(*##!@<<&&=+ 20 20 20 20 John White John White Fg4$$xX@<+- John White 2 2 2 2 40000 40000 40000 40000 $Sfsdf@_))#$dw?~$@33<?.%*##!@<<&&=+ 43 Bob Drake 3 85000 41 41 41 41 Linda Cone Linda Cone Linda Cone %25>LWe?#@ 3 3 3 3 90000 90000 90000 90000 <?.%d(*##!@%&*((@ <<&&=+sFDdw?~$@33 50 Sarah Brown 7 95000 000000000000000000000000000000000000000000000000 iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 43 43 43 43 Bob Drake Bob Drake 2We??#@$&& Bob Drake 3 3 3 3 85000 85000 85000 85000 ?~$<&&=+@33<?.% %&*((@sFDdwd(*##!@< 50 50 50 50 Sarah Brown Dadsf$&%!Aq Sarah Brown Sarah Brown 7 7 7 7 95000 95000 95000 95000 %&*((@sFDdw? @<<&&=+~$@33<? ((@sFD Encryption Alternatives • Implementation Level • Software v.s. Hardware encryption • Granularity of Data • Field (Attribute) level • Row (Record) level • (Disk) Page level ?
Encryption Alternatives (2) • Field level encryption • Pros: • Easier to implement and integrate • Flexible • Allows selective encryption, reduces number of bytes to encrypt/decrypt • Cons: • Increases encryption overhead significantly due to invocation cost • Data size expansion (for block cipher algorithms) • Current optimization technologies do not handle foreign functions well
Encryption Alternatives (3) • Row level encryption • Pros: • Reduces the data size expansion problem • Reduces invocation cost • Better security because of total encryption • Cons: • Does not allow selective encryption, increases the number of bytes to encrypt/decrypt • Implementation and integration can be hard when row functions are not supported
Encryption Alternatives (4) • Page level encryption • Pros: • Significantly reduces encryption/decryption overhead due to reduced invocation cost • Eliminates data size expansion problem (for block ciphers) • Better security because of total encryption • Cons: • Implementation and integration is not straightforward • Increases the number of bytes to encrypt/decrypt each time • Higher update/delete cost, requires re-encryption of all affected pages
Encryption Alternatives Experiments • Experimented with TPC-H database and queries Encryption scheme alternatives (V: evaluated, ×: not evaluated) Data Granularity ImplementationField Level Row Level Page Level Software EncryptionV×× Hardware Encryption× VV
Software - Field Level Encryption • Block Cipher Algorithm - Blowfish • Implemented as foreign function (UDF) • Sample insert insert into lineitem (discount) values (encrypt(10,key)); • Sample select select decrypt(discount,key) from lineitem where custid = 300;
Software - Field Level Encryption (2) • Creator supplies the key • Unauthorized person can not get hold of the key • protection even from the service provider at some level • User can easily implement different encryption algorithm and check that into the system • different encryption algorithm/key can be used for different fields
Software - Field Level Encryption (3) • Q#1 excluded • TPC-H queries, except Q#1 • * Only one field (l_discount of lineitem table) encrypted • Introduced very large overhead
TPC-H Query # 1 • Problem: Multiple decryption on same field select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from tpcd.lineitem where l_shipdate <= date ('1998-12-01') - 90 day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
Query Rewrite to Improve Performance • Problem: Multiple decryption on same field (e.g., TPC-H Q#1) • CSE based algorithm to eliminate redundant decryptions • Use temporary view
Hardware - Row Level Encryption • Specialized hardware IBM S/390 Cryptographic Coprocessor under IBM OS/390 • “editproc” facility • invoked for “whole row” • upon read/write request, encrypt/decrypt is invoked from hardware for the row
SW Field Level v.s. HW Row Level • Experimented on TPC-H Q#1 • Software Field Level: Only one field is encrypted • Hardware Row Level: All fields are encrypted
Hardware - Page Level Encryption • Page level encryption is simulated • It gives significant improvement due to reduction in start-up cost
Conclusion • Database as a Service is a new model to alleviates the need to • hire professionals • purchase expensive hardware/software • deal with administrative and maintenance tasks • It is viable model and can emerge as a successful offering • Encryption is a solution for privacy -the most important issue- • Hardware encryption has a clear superiority over software • Hardware makes encryption practical for databases • There are trade-offs for granularity of data