1 / 23

Revealing Information while Preserving Privacy

Revealing Information while Preserving Privacy. Kobbi Nissim NEC Labs, DIMACS. Based on work with: Irit Dinur , Cynthia Dwork and Joe Kilian. q ?. a. Patient data. Medical DB. The Hospital Story. Mr. Smith. Ms. John. d. d. Mr. Doe. A Bad Solution. Easy Tempting Solution.

nadda
Download Presentation

Revealing Information while Preserving Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit Dinur,Cynthia Dwork and Joe Kilian

  2. q? a Patient data MedicalDB The Hospital Story

  3. Mr. Smith Ms. John d d Mr. Doe A Bad Solution Easy Tempting Solution Idea: a. Remove identifying information (name, SSN, …) b. Publish data • Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…) • Worse:`rare’ attribute (CF  1/3000)

  4. d {0,1}n Mr. Smith aq=iqdi Ms. John  q  [n] Mr. Doe Our Model: Statistical Database (SDB)

  5. Crypto: secure function evaluation • want to reveal f() • want to hide all functions () not computable from f() • Implicit definition of private functions The Privacy Game: Information-Privacy Tradeoff • Private functions: • want to hide i(d1, … ,dn)=di • Information functions: • want to reveal fq(d1, … ,dn)=iq di • Explicit definition of private functions

  6. Approaches to SDB Privacy [AW 89] • Query Restriction • Require queries to obey some structure • Perturbation • Give `noisy’ or `approximate’ answers This talk

  7. Perturbation • Database: d = d1,…,dn • Query: q  [n] • Exact answer: aq=iqdi • Perturbed answer: âq PerturbationE:For all q: | âq– aq| ≤ E General Perturbation: Prq [|âq– aq| ≤ E] = 1-neg(n) = 99%, 51%

  8. Perturbation Techniques [AW89] Data perturbation: • Swapping [Reiss 84][Liew, Choi, Liew 85] • Fixed perturbations [Traub, Yemini, Wozniakowski 84] [Agrawal, Srikant 00] [Agrawal, Aggarwal 01] • Additive perturbation d’i=di+Ei Output perturbation: • Random sample queries [Denning 80] • Sample drawn from query set • Varying perturbations [Beck 80] • Perturbation variance grows with number of queries • Rounding [Achugbue, Chin 79] Randomized [Fellegi, Phillips 74] …

  9. Main Question: How much perturbation is needed to achieve privacy?

  10. On query q: • Let aq=iq di • If |aq-|q|/2| > E return âq = aq • Otherwise return âq= |q|/2 Privacy from n Perturbation (an example of a useless database) Database: dR{0,1}n Can we do better? • Smaller E ? • Usability ??? • Privacy is preserved • IfE  n (lgn)2, whp always use rule 3 • No information about d is given! No usability!

  11. Instead of defining privacy: What is surely non-private… • Strong breaking of privacy (not) Defining Privacy Defining Privacy • Elusive definition • Application dependent • Partial vs. exact compromise • Prior knowledge, how to model it? • Other issues …

  12. Strong Breaking of Privacy The Useless Database Achieves Best Possible Perturbation:Perturbation << n Implies no Privacy! • Main Theorem:Given a DB response algorithm with perturbation E << n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).

  13. The Adversary as a Decoding Algorithm âq1 âq2 âq3 d 2n subsets of [n] encode n bits (Recall âq = iqdi + pertq) Decoding Problem: Given access to âq1,…, âq2nreconstruct d in time poly(n). ’

  14. Goldreich-Levin Hardcore Bit Sideremark âq1 âq2 âq3 d 2n subsets of [n] encode n bits Where âq = iqdimod 2 on 51% of the subsets The GL Algorithm finds in time poly(n) a small list of candidates, containing d

  15. Sideremark Encoding: aq = iqdi (mod 2) aq = iqdi Noise: Corrupt ½- of the queries Additive perturbation  fraction of the queries deviate from perturbation Queries: Dependent Random Decoding: List decoding d’ s.t. dist(d,d’) < n (List decoding impossible) Comparing the Tasks

  16. Recall Our Goal: Perturbation << n Implies no Privacy! • Main Theorem:Given a DB response algorithm with perturbation E < n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).

  17. Proof of Main TheoremThe Adversary Reconstruction Algorithm Query phase: Get âqjfor t random subsets q1,…,qtof [n] • Weeding phase: Solve the Linear Program: 0  xi 1 |iqjxi - âqj|  E Rounding: Let ci = round(xi), output c Observation: An LP solution always exists, e.g. x=d.

  18. Observation: A randomqoften shows a n advantage either to 0’s or to 1’s. q d x Proof of Main Theorem Correctness of the Algorithm Consider x=(0.5,…,0.5) as a solution for the LP - Such a q disqualifies x as a solution for the LP • We prove that if dist(x,d) > n, then whp there will be a q among q1,…,qt that disqualifies x

  19. Extensions of the Main Theorem • `Imperfect’ perturbation: • Can approximate the original bit string even if database answer is within perturbation only for 99% of the queries • Other information functions: • Given access to “noisy majority” of subsets we can approximate the original bit-string.

  20. Notes on Impossibility Results • Exponential Adversary: • Strong breaking of privacy if E << n • Polynomial Adversary: • Non-adaptive queries • Oblivious of perturbation method and database distribution • Tight threshold E n • What if adversary is more restricted?

  21. Bounded Adversary Model • Database: dR{0,1}n • Theorem: If the number of queriesis bounded by T, then there is a DB response algorithm with perturbation of ~T that maintains privacy. With a reasonable definition of privacy

  22. Summary and Open Questions • Very high perturbation is needed for privacy • Threshold phenomenon – above n: total privacy, below n: none (poly-time adversary) • Rules out many currently proposed solutions for SDB privacy • Q: what’s on the threshold? Usability? • Main tool:A reconstruction algorithm • Reconstructing an n-bit string from perturbed partial sums/thresholds • Privacy for aT-bounded adversary with a random database • T perturbation • Q: other database distributions • Q: Crypto and SDB privacy?

  23. dR{0,1}n d d-i Our Privacy Definition (bounded adversary model) i … (transcript, i) di Fails w.p. > ½-

More Related