Private Inference Control

Private Inference Control David Woodruff MIT dpwood@mit.edu Joint work with Jessica Staddon (PARC)

Contents • Background • Access Control and Inference Control • Our contribution: Private Inference Control (PIC) • Related Work • PIC model & definitions • Our Results • Conclusions

Access Control • User queries a database. Some info in DB sensitive. What’s Bob’s salary? Server DB of n records Sensitive: Access denied • Access control prevents user from learning individual sensitive relations/attributes. • Does access control prevent user from learning sensitive info?

Inference Control • Combining non-sensitive info may yield something sensitive • Inference Channel: {(name, job), (job, salary)} • Inference Control : block all inference channels

Inference Control • Inference engine: generates collection C of subsets of [m] denoting all the inference channels • We assume have an engine [QSKLG93] (exhaustive search) • Database x 2 ({0,1}m)n • DB of n records, m attributes 1, …, m per record • n tending to infinity, m = O(1) • F 2 C means for all i, user shouldn’t learn xi, j for all j 2 F • Assume C is monotone. • Assume C input to both user and server • User learns C anyway when his queries are blocked • C is data-independent, reveals info only about attributes

Our contribution: Private Inference Control • Existing inference control schemes require server to learn user queries to check if they form an inference • Our goal: user Privacy + Inference Control = PIC • Privacy:efficient S learns nothing about honest user’s queries except # made so far • # queries made so far enables S to do inference control • Private and symmetrically-private information retrieval • Not sufficient since stateless – user’s permissions change • Generic secure function evaluation • Not efficient – our communication exponentially smaller • This talk: arbitrary malicious users U*, semi-honest S Can apply [NN] to handle malicious S

DB DB Application • Government analysts inspect repositories for terrorist patterns • Inference Control: prevent analysts from learning sensitive info about non-terrorists. • User Privacy: prevent server from learning what analysts are tracking – if discovered this info could go to terrorists!

Related Work • Data perturbation [AS00, B80, TYW84] • So much noise required data not as useful [DN03] • Adaptive Oblivious Transfer [NP99] • One record can be queried adaptively at most k times • Priced Oblivious Transfer [AIR01] • One record, supports more inference channels than threshold version considered in [NP99] • We generalize [NP99] and [AIR01] • Arbitrary inference channels and multiple records • More efficient/private than parallelizing NP99 and AIR01 on each record

The Model • Offline Stage: S given x, C, 1k, and can preprocess x • Online Stage: at time t, honest U generates query (it, jt) • (it, jt) can depend on all prior info/transactions with S • Let T denote all queries U makes, (i1, j1), …, (i|T|, j|T|) • T r.v. - depends on U’s code, x, and randomness • T permissable if no i s.t. (i,j) 2 T for all j 2 F for some F 2 C. We require honest U to generate permissable T. • U and S interact in a multiround protocol, then U outputs outt • ViewU consists of C, n, m, 1k , all messages from S, randomness • ViewS consists of C, n, m, 1k, x, all messages from U, randomness

Security Definitions • Correctness: For all x, C, for all honest users U, for all  2 [|T(U, x)|], if T permissable, out = xi, j • User Privacy: For all x, C, for all honest U, for any two sequences T1, T2 with |T1| = |T2|, for all semi-honest servers S* and random coin tosses of S* • (ViewS* | T(U, x) = T1)  (ViewS* | T(U, x) = T2) • Inference Control: Comparison with ideal model – for every U*, every x, any random coins of U*, for every C there exists a simulator U’ interacting with trusted party Ch for which ViewU*  View<U’, Ch>, where U’ just asks Ch for tuples (it, jt) that are permissable

Efficiency • Efficiency measures are per query • Minimize communication & round complexity • Ideally O(polylog(n)) bits and 1 round • Minimize server’s time-complexity • Ideally O(n) without preprocessing • W/preprocessing, potentially better, but O(n) optimal w.r.t. known single-server PIR schemes

Our Result • Using best-known PIR schemes [CMS99], [L04]: • PIC scheme (O~ hides polylog(n), poly(k) terms) • Communication O~(1) • Work O~(n) • 1 round

A Generic Reduction • A protocol is a threshold PIC (TPIC) if it satisfies the definitions of a PIC scheme assuming C = {[m]}. • Theorem (roughly speaking): If there exists a TPIC with communication C(n), work W(n), and round complexity R(n), then there exists a PIC with communication O(C(n)), work O(W(n)), and round complexity O(R(n)).

PIC ideas: … … cnvdselvuiaapxnw • User/server do SPIR on table of encryptions • Idea: Encryptions of both data and keys that will help user decrypt encryptions on future queries • User can only decrypt if has appropriate keys – only possible if not in danger of making an inference

Stateless PIC • Efficiency of PIC is a data structures problem • Which keys most efficienct for user to: • Update as user makes new queries? • Prove user not in danger of making an inference on current/future queries? • Keys must prevent replay attacks: can’t use “old” keys to pretend made less queries to records than actually have

PIC Scheme #1 – Stage 1 • Let E by a homomorphic semantically secure encryption scheme (e.g., Pallier) • Suppose we allow accessing each record at most once E(i3), E(j3), ZKPOK PK, SK PK (i3, j3) E(i1) -> E(r1(i1 – i3)) E(i2) -> E(r2(i2 – i3)) Recovers r1, r2 iff hasn’t previously accessed i3 • From r1 and r2 user can reconstruct a secret S

PIC Scheme #1 – Stage 2 E(i3), E(j3), commit, ZKPOK PK, SK PK (i3, j3) Recovers S User does “SPIR on records” on table of encryptions

PIC Scheme #1 - Wrapup • To extend to querying a record < m times, on t-th query, let r1, …, rt-1 be (t-m+1) out of (t-1) secret sharing of S • This scheme can be proven to be a TPIC – use generic reduction to get a PIC • User Privacy: semantic security of E, ZK of proof, privacy of SPIR • Inference Control: user can recover at most t-m ri if already queried record m-1 times – can build a simulator using SPIR w/knowledge extractor [NP99]

PIC Scheme #2 - Glimpse t O~(1)-communication, O~(n) work PIC • Balanced binary tree B • Leaves are attributes • Parents of leaves are records • Internal node n accessed when record r queried and n on path from r to root • Keys encode # times nodes in B have been accessed. Ku, a Kv, b Kw,c Kx,d Ky,e Kz,f 1 2 3 4 a+b =t

Conclusions • Extensions not in this talk • Multiple users (pseudonyms) • Collusion resistance: c-resistance => m-channel becomes collection of (m-1)/c channels. • Summary • New Primitive – PIC • Essentially optimal construction w.r.t. known PIR schemes

Private Inference Control