1 / 69

Mário S. Alvim Ph.D. Thesis Defense École Polytechnique – LIX Supervised by catuscia palamidessi

Formal approaches to information hiding: an analysis of interactive systems, statistical disclosure control, and refinement of specifications. Mário S. Alvim Ph.D. Thesis Defense École Polytechnique – LIX Supervised by catuscia palamidessi. TexPoint fonts used in EMF.

kreeli
Download Presentation

Mário S. Alvim Ph.D. Thesis Defense École Polytechnique – LIX Supervised by catuscia palamidessi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formal approaches to information hiding: an analysis of interactive systems, statistical disclosure control, and refinement of specifications Mário S. Alvim Ph.D. Thesis Defense École Polytechnique – LIX Supervised by catuscia palamidessi TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA

  2. Part I Introduction Ph.D. Defense - Mário S. Alvim

  3. Information hiding In many cases the broad and efficient dissemination of information is desirable. But in several situations it is undesirable, or even unacceptable, that part of the information be leaked. Information hiding deals with the problem of keeping secret part of the information processed by a computational system. Ph.D. Defense - Mário S. Alvim

  4. Subfields of information hiding • Subfields of information hiding vary depending on: • Whatone wants to keep secret; • From which adversaryor attacker; • How powerful the adversary is. • The subfields are not mutually exclusive. • We observe an increasing covergence in the research. An individual’s identity? A message’s contents? The link between an individual and an action? Can he only observe the system? Can he interact with the system? An external entity? A user of the system? Ph.D. Defense - Mário S. Alvim

  5. Our focus • Information flow: protecting the secret information w.r.t. what can be deduced from the observable behavior of the system. • Ex: Election system • Statistical disclosure control: protecting individual information within a statistical sample. observables secrets Alice -> X X=2, Y=1 Bob -> X Time Cindy > Y Heating Ph.D. Defense - Mário S. Alvim

  6. The qualitative approach • By observing the system’sbehavior, the adversarycannotbe sure of what the secret is. • The principle of confusion: “For every observable output generated by a secret input value, there is another secret value that could also have generated the same output.” • Does not take into consideration the adversary’s level of (un)certainty about the secret. • Noninterference:the secrets do not alter the observable behavior of the system. • Unachievable in practice. Partitioning ? ... ... Ph.D. Defense - Mário S. Alvim

  7. The quantitative approach • Takes into consideration the level of (un)certainty of the adversary. • Allows us to compare two systems w.r.t. the level of security they provide. • Makes use of probabilities. • Main approaches: • Bayes risk • Information theory Our focus on this thesis Ph.D. Defense - Mário S. Alvim

  8. Plan of the presentation Part II Information theory as a framework for information leakage Part III Information flow in interactive systems Part IV Differential-privacy: the trade-off between privacy and utility Part V Safe equivalences for security properties Part VI Conclusion Ph.D. Defense - Mário S. Alvim

  9. Part II Information theory as a framework for information leakage Ph.D. Defense - Mário S. Alvim

  10. Information theory and communication • Information theory originally focused on how to transmit information through unreliable (or noisy) channels. • It allows us to reason about: • the degree of uncertaintiy of a random variable; • the amount of information one random variable carries about another random variable. Ph.D. Defense - Mário S. Alvim

  11. Noisy channels Channel matrix Noisy channel • is a finite input alphabet • is a finite output alphabet • is the probability of output given input • is the channel matrix where input output secrets System’sbehavior observables Ph.D. Defense - Mário S. Alvim

  12. Information leakage • General principle: • The uncertaintycan be measured in different ways, corresponding to different models of attack. • Models of guessing attacks (Köpf and Basin): • The adversary wants to determine the value of a random variable . • He can ask (adaptatively) several yes/no questions to an oracle. • The attacker knows the a priori distribution . • Different measures of uncertainty correspond to different models of attack. A subsequent question may depend on the answer to a previous question.. Ph.D. Defense - Mário S. Alvim

  13. Shannon entropy Leakage as mutual information: Leakage Initial uncertainty Remaining uncertainty • Meaning in security: • The adversary can ask questions of the type “Does belong to ?” • is the lower bound to the expected number of questions necessary to determine the value of . Ph.D. Defense - Mário S. Alvim

  14. Réniy min-entropy Leakage as min-entropy leakage:: Leakage Initial uncertainty Remaining uncertainty (Smith) • Meaning in security: • One try attack: “Is?” • Closely related to the Bayes risk. Ph.D. Defense - Mário S. Alvim

  15. Part III Information flow in interactive systems Ph.D. Defense - Mário S. Alvim

  16. The problem of interactivity • So far the information-theoretic approach has been applied only to systems where secrets do not depend on observables. • In interactive systems secrets and observables can interleaveand influence each other: • Auction protocols, web applications, command line programs, etc. • In such systems the classic information-theoretic approach fails. Ph.D. Defense - Mário S. Alvim

  17. The problem of interactivity: an example Web based application A seller can offer a cheapor an expensiveproduct (observables) Two possible buyers: richor poor(secrets) Channel matrix: ? expensive cheap 0.5 0.5 poor rich poor rich t’ s’ t s S=0.4, t=0.6 • Channel matrix is not invariant w.r.t. input distribution. • Capacity can no longer be calculated. S=0.1, t=0.3 Ph.D. Defense - Mário S. Alvim

  18. Our contribution • Extend the classic information-theoretic approach to interactive systems: • Modelling systems as Interactive Information-Hiding Systems (IIHSs); • Using channels with memory and feedback; • Re-interpreting the leakage in this more genereal scenario, finding a more adequate definition of leakage. • Show that the capacity of the channels associated to IIHSs is a continuous function of the Kantorovich metric Ph.D. Defense - Mário S. Alvim

  19. Some necessary technicalities • is a set of symbols • In a sequence of symbols, represents the symbol at time • Example: In we have and • contains all the information about the joint behavior of the sequences of inputs and outputs up to time • By probability laws: feedback memory Ph.D. Defense - Mário S. Alvim

  20. Channels with memory and feedback Code-functions “Interactor” Stochastic Kernels Delay • Mutual information can be slpit into its components: • directed information from input to output • directed information from output to intput • It can be shown that Ph.D. Defense - Mário S. Alvim

  21. Modelling IIHS’s as channels with memory and feedback • Theorem: Given a fully probabilistic IIHS, it is always possible to construct a joint prob. dist. s.t. it always hold (): • And a corollary shows how to construct . Code-functions “Interactor” Stochastic Kernels Combine altogether in a new joint probability distribution Combine altogether in a new joint probability distribution Delay Combine altogether in a new joint probability distribution Comes directly from the IIHS Deterministic: how to embed into it? Behavior of the channel Behavior of the IIHS Ph.D. Defense - Mário S. Alvim

  22. Leakage • In the classical information theoretic approach: • In channels with memory and feedback: • The worst case leakage is the capacity of the channel: • where is the set of all possible input distributions 3 examples of Info. Leakage A priori uncertainty of the input distribution A posteriori uncertainty Leakage A priori uncertainty of the “reactor” A posteriori uncertainty Leakage Ph.D. Defense - Mário S. Alvim

  23. Part IV Differential privacy: the trade-off between privacy and utility Ph.D. Defense - Mário S. Alvim

  24. Statistical databases • A statistical database is a collection of data of several participants. • Usersof the database can ask statistical queries, such as: • Average height, maximum salary, most common disease. • Usually we consider the global information relative to the database as public, while the individual information about a participant is private. Ph.D. Defense - Mário S. Alvim

  25. An example • A statistical database contains the salary of several employees. • A user has the some side information: • There are 100 people in the database (counting query) • The average salary is 3.000 €(average query) • Then Robert is included in the database. The user repeat the queries and finds out that the average salary is now 3.050 €. • And she can conclude that Robert earns 8.050 €: privacy breach! Newspapers,common sense,previous queries, etc Previous knowledge Ph.D. Defense - Mário S. Alvim

  26. General problem • How to ensure that the queries provide statistical information about the whole sample without harming the privacy of the participants? • Usually it is done by adding randomization: instead of reporting the real answer for the query, a noisy answer is reported to the user. • The noise is carefully added to obfuscate the link between the values of participants in the database and the reported answer to the query. • Yet the noise should avoid reporting answers that are “too far away” from the real answers. Ph.D. Defense - Mário S. Alvim

  27. A model of utility and privacy • Participants: • Values: • Universe of databases: • Randomized function: • where Absence is included as a special symbol, e.g. null Channel ratio dataset reportedanswer -d.p. randomized function Ph.D. Defense - Mário S. Alvim

  28. Differential Privacy • Differential privacy [Dwork]: the effect of the presence of any individual in a database will be negligible, even when an adversary has auxiliary knowledge. • We can also consider presence/absence of any individual, or his value. • It is a strong statistical guarantee. • Formally (discrete case): • Two databases and differing on the presence/value of at most one row are called neighbors or adjacent. We write . • A function provides -differential privacy if, for every , and for all possible answer to the query: Ph.D. Defense - Mário S. Alvim

  29. A model of utility and privacy Oblivious mechanisms: the reported answer depends only on the real answer, and not on the database. dataset real answer reported answer randomization mechanism Leakage query (-diff. priv. randomized function) Utility Ph.D. Defense - Mário S. Alvim

  30. Our contribution (1) Does-d.p. induce a bound on the information leakage of the randomized function ? (2) Does -d.p. induce a bound on the information leakage relative to an individual? (3) Does -d.p. induce a bound on the utility? (4) Given a query and a value , can we construct a randomized function satisfying -d.p. and also presenting maximum utility? In the worst case scenario where the attacker knows the values of all other participants. Ph.D. Defense - Mário S. Alvim

  31. The adopted measures of utility and leakage • Leakage is modeled as min-entropy leakage: • Utility is modeled with gain functions: • Binarygain function: if and otherwise. • In the binary case is the Bayes risk. Ph.D. Defense - Mário S. Alvim

  32. Methodology The adjacency relation on the database domain induces a graph . The relation can be extended to the real answers domain :if and then is also a graph. We consider two special types of graphs: • Distance-regular Ph.D. Defense - Mário S. Alvim

  33. Some theorems • Given a channel from to , we perform transformations which: • Are valid for the uniform input distribution; • Preserve the a posteriori min-entropy • Provide -d.p. • This allows us to find very regular matrices. • And therefore a bound on any graph Corresponds to the maximum value of . dist-regular Ph.D. Defense - Mário S. Alvim

  34. The proof technique • The previous theorems can be applied to any channel from to . • Leakage: we apply the theorems to the channel from to • Utility: we apply the theorems to the channel from to randomization mechanism Leakage dataset real answer reported answer query Utility (-diff. priv. randomized function) Ph.D. Defense - Mário S. Alvim

  35. The bounds • Leakage: we apply the theorems to the channel from databasesto reported answers • Proposition: is both distance-regular and • Utility: we apply the theorems to the channel from real answersto reported answers • when the graph is distance-regular or Ph.D. Defense - Mário S. Alvim

  36. Our contribution (1) Does-d.p. induce a bound on the information leakage of the randomized function ? Yes: (2) Does -d.p. induce a bound on the information leakage relative to an individual? Yes: It works in every case, as is always dist-reg. and Ph.D. Defense - Mário S. Alvim

  37. Our contribution (3) Does -d.p.induce a bound on the utility? Yes: (4) Given a query and a value , can we construct a randomized function satisfying -d.p.and also presenting maximum utility? Yes: , where Only when is also dist.-reg. or Only when is also dist.-reg. or Ph.D. Defense - Mário S. Alvim

  38. An example • A database with tuples: • voter id, voter city, candidate • There are 6 cities: A, B, C, D, E, F • Query: Which city had more votes for a given candidate? • Clearly the gain is binary • is a clique Optimal mechanism: Ph.D. Defense - Mário S. Alvim

  39. Part V Safe equivalences for security properties Ph.D. Defense - Mário S. Alvim

  40. Equivalences in security • Equivalence relations are often used to formalize information hiding properties. • Examples: • A system guarantees anonymity for users and if: (trace equivalence) • Votesof users and for candidates and are confidential in a system if: (bisimulation) Ph.D. Defense - Mário S. Alvim

  41. The role of nondeterminism • In the presence of nondeterminism, there is a (dangerous) implicit assumption: • all the nondeterministic possibilities of the specification will be possible under every implementation of (or at least that the adversary will believe so). • Nondeterminism can have different natures: • Nondeterminism by design: preserved under refinement; • Underspecification: not necessarily preserved under refinement. Ph.D. Defense - Mário S. Alvim

  42. Nondeterminism by design: is secure. Should be presereved in the implementation Mix Mix Mix Ph.D. Defense - Mário S. Alvim

  43. Underspecification: But is not secure. User May be eliminated in the implementation User User Ph.D. Defense - Mário S. Alvim

  44. Motivation • Two types of nondeterminism: • Angelic: inherent to the system, like in . The scheduler has freedom to help the system. • Demonic: underspecification, like in . The design should guarantee that even in the worst case choice (by the scheduler), the security is still preserved. • Problem: in the equivalence approach the nondeterminism is considered only as angelic. Ph.D. Defense - Mário S. Alvim

  45. Contribution A formalism to handle both angelic and demonic nondeterminism. Notions of safe equivalences: safe trace-equivalence and safe-bisimulation. We show that these notions of safe equivalences imply “no leakage”. Ph.D. Defense - Mário S. Alvim

  46. Admissible schedulers • Global schedulers • Communication, interleaving • Cannot see the internal choices of the components Global nondeterminism (implementation freedom) • Local schedulers Local nondeterminism (inherent to the system) • Local schedulers • Randomness, noise • One for each component • Cannot see internal choices of the other components. Ph.D. Defense - Mário S. Alvim

  47. Safe bisimulation • Safe bisimulation • such that, whenever , then for all admissible global schedulers : Ph.D. Defense - Mário S. Alvim

  48. Safe trace-equivalence • Safe trace-equivalence • such that, whenever : • is but not • Theorem: safe-bisimulation implies safe trace-equivalence Ph.D. Defense - Mário S. Alvim

  49. Safe nondeterministic information hiding Definition: A system is leakage-free if for all observable and secrets we have • Example:(Binary secret) • is but not • Now is also Ph.D. Defense - Mário S. Alvim

  50. Safe nondeterministic information hiding Definition: A system is leakage-free if for all observable and secrets we have • Theorem: If then is leakage free. • Corollary: If then is leakage free. Ph.D. Defense - Mário S. Alvim

More Related