1 / 15

Daniela Ichim

Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata Files for Research. Daniela Ichim. Dissemination of Microdata Files for Research Risk assessment Disclosure limitation Data quality Record linkage Data utility. Outline.

sabina
Download Presentation

Daniela Ichim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Community Innovation Survey:a Flexible Approach to the Dissemination of Microdata Files for Research Daniela Ichim

  2. Dissemination of Microdata Files for Research Risk assessment Disclosure limitation Data quality Record linkage Data utility Outline

  3. Confidentiality against Dissemination Disclosure scenarios Find the right balance!

  4. IDENTIFYING VARIABLES Nace Nuts Size Turnover (TURN) (STRUCTURAL VARIABLES) CONFIDENTIAL VARIABLES Expenditures in innovation (RTOT, …) Number of patents, … (VARIABLES INVOLVED IN ANALYSES) Community Innovation Survey

  5. Confounding Numerical Categorical A A … A k-anonymity safe unsafe

  6. General risk function Distance between and Density around : • Given a threshold (on units) • Local Outlier Factor as a • measure of difference in density between • a unit and its nearest neighbours

  7. Parameters • Cut-off point for density (LOF) • quantiles • automatic • Threshold - dissemination policy

  8. Stratification variables Analysis by Nace Nace A all Nace

  9. MFR Selective masking Disclosure limitation • k-anonymity • Nearest neighbour • Micro-aggregation on tails

  10. Quality assessment Dissemination Confidentiality

  11. Quality of the external database E D Risk measure assessment Record linkage Chambers of Commerce database

  12. Record linkage a) 100% for enterprises with more than 250 employees

  13. Information preservation Selective masking Data utility Only identifying and confidential variables were modified. Only records at risk were modified. The weights were not modified. weighted totals (coherence with the already published information) Information content analysis • Some statistical indicators were slightly modified: • variances

  14. Original Selective masking Individual ranking Information content analysis Data utility Assessment of the perturbation impact on ratios likeRTOT/TURN

  15. Confidentiality: Risk measure based on the k-anonymity principle Flexible a) continuous and categorical variables b) easy to implement c) consistent for extreme choices Data utility: Selective protection to achieve the k-anonymity Comparable dissemination: Control both risk of re-identification and information loss Conclusions QUALITY DIMENSIONS

More Related