1 / 16

Answering Multi-Dimensional Analytical Queries under Local Differential Privacy

Answering Multi-Dimensional Analytical Queries under Local Differential Privacy. Tianhao Wang*, Bolin Ding , Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, Somesh Jha. * Work done at Alibaba. Local Differential Privacy (LDP).

joshuaj
Download Presentation

Answering Multi-Dimensional Analytical Queries under Local Differential Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Multi-Dimensional Analytical Queries under Local Differential Privacy TianhaoWang*, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, SomeshJha * Work done at Alibaba

  2. LocalDifferential Privacy (LDP) • takes reports from all users and outputs estimations for any value indomain LDP frequency oracle: counting how many times appears, e.g., [Wangetal.USENIX’17] NoisyData NoisyData NoisyData • takes input value from domain and outputs is-LDPifffor any and from, and any valid output , Data Data Data Data Data Smaller 𝜀 ->Stronger Privacy Trust boundary Active line of research since [Duchi,  Jordan, and Wainwright 2013]

  3. Frequency Oracle (FO):RandomResponse • Surveytechniqueforprivatequestions [1] • Surveypeople: • “DoyouhavediseaseX?” • Eachperson: • Flip a secret coin • Answer truth if head (w.p. ) • Answer randomly if tail(w.p. ): • reply “yes”/“no” w.p. 0.5 Similarly, [1] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 1965.

  4. Frequency Oracle (FO):RandomResponse • To get unbiased estimation of the histogram: • If out of 𝑛 people have the disease, we expect to see: “yes” answers • Solving the above equation: is an unbiased estimation of • Privacy guarantee: • For any 𝒗 and 𝒗′ from “yes” and “no”, , • When the domain is large, one can achieve better utility using, e.g., [2-4] [2] Wang, et al. Locally Differentially Private Protocols for Frequency Estimation. In USENIX Security 2017 [3] Bassily, et al. Practical Locally Private Heavy Hitters. In NIPS 2017 [4] Ding, et al. Collecting Telemetry Data Privately. In NIPS 2017 and PPML@NIPS 2018

  5. Problem Setting: Multi-Dimensional Analytics (MDA) under LDP is-LDPifffor any and from, and any valid output , LDP protects users’ data, which is sensitive and generated on devices LDP Random Response Server gets users’ non-sensitive profile and log data We can answer queries in this (joined) fact table Non-sensitive Attributes

  6. Answering MDA Queries: Key Contributions Non-sensitive Attributes Perturbed Attributes • Challenges • Handle aggregation • Range predicates • Multiple dimensions Contributed by Users Trust Boundary Between Server and the Users

  7. Challenge 1: How to Aggregate • Strawman method • Evaluate row by row: $120+$100=$220 • Bias due to randomization Non-sensitive attributes Perturbed attributes

  8. Challenge 1: How to Aggregate • Strawman method • Evaluate row by row: $120+$100=$220 • Bias due to randomization • Group users by aggregating attributes • Group $100: 2 users satisfy the predicate • Group $120: 1 user satisfies the predicate • Weighed sum of estimates • For the group Purchase = $100, estimate how many users satisfy the predicate • If estimates of groups sizes are unbiased, the weighted sum is unbiased • What if aggregating attributes are sensitive? (randomized rounding!) • Welcome to our VLDB2019 System Demo Non-sensitive attributes Sensitive attributes

  9. Challenge 2: Range Predicates (1-dim) • Solution: Hierarchical intervals • Domain size (=8) • A range predicate is decomposed into intervals • Partition users on the layers • Each user reports the histogram on her/his layer using FO • Baseline: Histogram • Each bar has noise • Bad when query range is large MSE = [5] Hay, et al. Boosting the Accuracy of Differentially Private Histograms Through Consistency. VLDB 2010 in predicate

  10. Challenge 3: -Dimensional Queries • HIO (Hierarchical Interval Optimized) • Product of hierarchies • Partition users into groups • Decompose a -dim range predicate into sub-queries MSE(HIO) = What if is large but is small?

  11. Challenge 3-2: -Dimensional Queries • is large but is small • SC (Split and Conjunction) • Split: each user divides privacy budget by , reporting every 1-dim marginal independently • Conjunction: estimating joint distribution from 1-dim marginals • Decompose a -dim range predicate into sub-queries MSE(SC) = MSE(HIO) = , if

  12. Experiments • Dataset:IPUMS and TRANS dataset • (alsoonAdult and Bankdatasets) • Results of a single run

  13. Experiments: HIO v.s. LDP Marginals Normalized absolute error is plotted Predicate is the conjunction of 3 range constraints BetterAccuracy MG stands for the state-of-the-art LDP marginal-releasing technique [6] Query Range [6] Zhikun Zhang, Tianhao Wang, Ninghui Li, Shibo He, and Jiming Chen. CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy. In CCS 2018.

  14. Experiments: More Dimensions Normalized relative error BetterAccuracy x+y means the query is the conjunction of x point queries and y range queries Data contains 4 categorical attributes and 4numerical attributes SC performs better when x+y is smaller

  15. Conclusion • Enabling multi-dimensional analytics (MDA)under LDP • LDP protects users’ sensitive data while the server can utilize other profiles • We can answer MDA queries in the (joined) fact table • Come to our poster for more details and discussion • The solution has been built as a service in a data platform in Alibaba • Advertisement: demo at VLDB’2019 • LDP data sharing/analytics services – a middleware solution • DPSAaS@ : Private Multi-Dimensional Data Sharing and Analytics as Services

More Related