1 / 42

Data Perturbation An Inference Control Method for Database Security

Data Perturbation An Inference Control Method for Database Security. Dissertation Defense Bob Nielson Oct 23, 2009. I. Introduction. Most security concerns can be handled with the grant command. Others require a view approach

shalin
Download Presentation

Data Perturbation An Inference Control Method for Database Security

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data PerturbationAn Inference Control Method for Database Security Dissertation Defense Bob Nielson Oct 23, 2009

  2. I. Introduction • Most security concerns can be handled with the grant command. • Others require a view approach • But what happens if we wish to disclose partial information in a table field but not the individual records?

  3. I. Introduction – The Problem • The problem is to allow for statistical analysis of data but still protecting individual records. • Example: Given a database of cancer patients. Allow for a researcher to know what the cancer rate is, but not that patient X has cancer.

  4. I. Introduction – The Problem

  5. II. Related Work • Suppression • Anonymization • Partitioning • Data Logging • Conceptual • Hybrid • Perturbation

  6. II. Related Work-Suppression • Must access n records • Only n queries per day • There are known methods to get around these protections.

  7. II. Related Work-Anonymization • Replace the identifying fields with special characters. • This method can still be compromised.

  8. II. Related Work- Anonymization

  9. II. Related Work-Partitioning • All queries must access more than one band of records.

  10. II. Related Work-Partitioning

  11. II. Related Work –Logging • A log of every query ran is kept. • Before a query is allowed all possible inferences are checked. If it releases one record, then that query is not permitted. • Soon there are no queries allowed.

  12. II. Related work – Conceptual • Design the database so that no confidential information is stored.

  13. II. Related Work –Hybrid • Try using a combination of several of these methods.

  14. II. Related Work - Perturbation • Output Perturbation • Data Perturbation • Liew Perturbation • Nielson Perturbation • Note: Perturbation means data changing

  15. II. Related Work –Output Perturbation • Output perturbation works by changing the output of the query not the physical data.

  16. II. Related Work – Output Perturbation

  17. II. Related Work –Data Perturbation • Data perturbation works by changing the physical data. • Two common methods: • To add a random value to each value • To multiple each value by a random value

  18. II. Related Work – Data Perturbation

  19. II. Related Work –Data Perturbation

  20. II. Related Work –Liew Perturbation • Liew perturbation steps: • Calculate the average, standard deviation, and count of the data • Generate a new data set with the same average, standard deviation and count • Sort both data sets in ascending order • Swap the perturbed values with each other.

  21. II. Related Work–Liew Perturbation

  22. II. Related Work – Liew Perturbation

  23. III Hypothesis and Proof • Prove: • H1: Nielson perturbation is better than No Perturbation • H2: Nielson perturbation is better than data perturbation (20%) • H3: Nielson perturbation is better than Liew perturbation (20%)

  24. III Hypothesis and Proof • Disprove: • H1: Nielson perturbation is not better than No Perturbation • H2: Nielson perturbation is not better than data perturbation (20%) • H3: Nielson perturbation is not better than Liew perturbation (20%)

  25. IV. Methodology • What is Nielson Perturbation? • Calculating the absolute error . . . • Finding optimal values for Nielson perturbation . . . • Experimental design . . . • Conducting the experiment . . .

  26. IV. Methodology-Nielson Perturbation • Nielson Perturbation is a form of data perturbation. • Each value is multiplied by a random value between alpha and beta for the first gamma records in the data set. • This value is randomly negated.

  27. IV Methodology- Nielson Perturbation

  28. IV. Methodology - Nielson

  29. IV. Methodology-Alpha/Beta/Gamma • What are the best values? • An evolutionary algorithm was deployed. • The results after several days of computation were: • Alpha = 2.09 • Beta = 1.18 • Gamma = 66.87

  30. IV. Methodology- Evolutionary Results

  31. IV. Methodology-Nielson Perturbation

  32. IV. Methodology-Nielson Perturbation

  33. IV. Methodology- The Method • Calculate the average error of each method. • Use the law of large numbers: An average of averages approaches a normal distribution as the sample size grows.

  34. IV. Methodology- The Method • Use a t-test to calculate whether two sample means are statistically different from each other with a significance of 95%

  35. IV. Methodology- Monte Carlo Simulation • Randomly generate 100,000 databases and execute 100’s of queries. • I will use arrays to test the accuracy. Speed is of major importance here. • Arrays vs. databases do not matter for calculating the accuracy of query outputs

  36. IV. Methodology- Calculating the average error • The error should be bigger with smaller query sizes. • The error should be smaller with larger query sizes.

  37. Methodology-The Fitness Function e=|x-x’| If q < n/2 fitness=100-e Else fitness=e Smaller fitness scores are better

  38. V. Results and Conclusions

  39. V. Results and Conclusions

  40. V. Results and Conclusions

  41. V. Results and ConclusionsSignificance • There is a real need for partial disclosure of a field in a table. • My method insures a higher degree of security. • My method still allows for release of averages and totals.

  42. VI. Further Studies • Transformation Times • On the fly perturbing

More Related