1 / 52

Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle. We Are Here Today: Review & Processing. http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/.

triage
Download Presentation

Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Curating and Managing Research Data for Re-UseReview & ProcessingJared Lyle

  2. We Are Here Today: Review & Processing

  3. http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/

  4. A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users. Do no harm.

  5. http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf

  6. Review • Documentation • Data • [Disclosure Review]

  7. Is the data collection complete, accurate, and well-documented?

  8. Documentation http://dx.doi.org/10.3886/ICPSR31521.v1

  9. Essential Descriptive Elements • Basic front matter • Variable level details • Methodology

  10. Documentation: Front Matter Title http://dx.doi.org/10.3886/ICPSR31521.v1 Principal Investigator(s)

  11. Documentation: Front Matter Description Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009. Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O'Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009 [Computer file]. ICPSR28401-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-10-27. doi:10.3886/ICPSR28401.v1

  12. Documentation: Variable-level Details National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html

  13. Documentation: Variable-level Details Variable Name

  14. Documentation: Variable-level Details Variable Label

  15. Documentation: Variable-level Details Variable Type

  16. Documentation: Variable-level Details Question Text

  17. Documentation: Variable-level Details Values

  18. Documentation: Variable-level Details Value Labels

  19. Documentation: Variable-level Details Missing Data

  20. Documentation: Variable-level Details Summary Statistics

  21. Documentation: Variable-level Details Constructed Variables

  22. Documentation: Variable-level Details Notes Skip Patterns

  23. Documentation: Variable-level Details (examples) American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt

  24. Documentation: Variable-level Details (examples) Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Resarch, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02-08. doi:10.3886/ICPSR25962

  25. Documentation: Variable-level Details (examples) United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621

  26. Documentation: Variable-level Details (examples) United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982

  27. Documentation: Methodology • Sample design: A description of how the cases that appear in the study were selected, including details about target populations, sampling frames, sample sizes, sampling errors, and sampling methods. • Data collection procedures: The methods used to collect the data (e.g., telephone, mail, computer-assisted). Where applicable, this includes the exact instructions and protocols used by interviewers when they collected the data. • Data processing: The activities and quality checks performed on the data collection to generate the final data products from the raw collected data. If files were merged , a full description of the process should be provided.

  28. Documentation: Methodology • Weighting: Where applicable, a description of the criteria for using weights in the analysis of a data collection, including how the weights were created, all weighting formulae or coefficients, a definition of their elements, and an indication of how the formulae are applied to the data. • Confidentiality issues: Where applicable, a discussion of any confidentiality issues in the data, as well as the steps taken to mitigate disclosure risk.

  29. Other Documentation • Questionnaire • User Guide • Handbook • Manual • Report • Table • User Agreement • Errata

  30. Useful Resources: Description ICPSR, “What is a codebook?” http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook Institute for Health and Care Research Quality Handbook http://www.emgo.nl/kc/preparation/data%20collection/3%20Codebook.html Princeton University Data and Statistical Services, “How to Use a Codebook” http://dss.princeton.edu/online_help/analysis/codebook.htm UCLA Social Science Data Archive, “Codebooks”http://dataarchives.ss.ucla.edu/tutor/tutcode.htm

  31. Data

  32. Data Labels • Does each variable have a variable name and label? • Do all categorical variables have value labels? • Are labels consistent?

  33. Naming Conventions: Variables Variable Names: • One-up numbers (V1, V2) • Question numbers (Q1, Q2) • Mnemonic names (age, race) • Prefix, root, suffix systems (FAED, MOED)

  34. Naming Conventions: Variables Variable Labels: • Item/Question number • Indicate variable content • Indicate if variable constructed Q14: Assessment of R’s Health

  35. Naming Conventions: Values Value Labels: • Mutually exclusive, exhaustive, and defined • Preserve original information • Retain original coding scheme Respondent’s Employment Status Self-employed (1) Somewhere-else (2) No answer (9) Not applicable (BK)

  36. Missing Data • Are there missing data? • Are missing data labeled? 77 = Inapplicable 88 = Don’t Know 99 = No Answer

  37. Values • Are the values reasonable (for example, date variables contain dates, gender variables don't have 10 categories, variables aren't all system missing)? • Are there weight variables? If so, are they well documented?

  38. Matching Data & Documentation • Do the data match the documentation? Are values and/or labels listed in one but not in the other? • Are all codes in the data valid (documented) according to the data collection instrument or PI's codebook? • Are there duplicate records? • Does the spelling look OK?

  39. Processing History

  40. Useful Resources: Data UK Data Archive, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.data-archive.ac.uk/create-manage/document/data-level?index=1 ICPSR Guide to Social Science Data Preparation and Archiving: Phase 3: Data Collection and File Creation, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3quant.html

  41. Activity • Review the following data output and report any issues you find.

  42. Examples of What to Look For:

  43. Examples of What to Look For:

  44. Examples of What to Look For:

  45. Examples of What to Look For:

  46. Examples of What to Look For:

  47. Examples of What to Look For:

  48. [Disclosure Review]

  49. Discussion • How much cleaning do you do to a data collection? • When is it appropriate to change the ‘original order’ of a data collection? • How many processing details do you include in the study documentation?

  50. Example: Review @ICPSR

More Related