480 likes | 779 Views
IHSN Microdata Management Toolkit and related standards and good practices. Olivier Dupriez World Bank, Development Data Group Manager, International Household Survey Network (IHSN) Addis Ababa, September 23, 2011. Microdata Management Toolkit. Two main components
E N D
IHSN Microdata Management Toolkit and related standards and good practices Olivier Dupriez World Bank, Development Data Group Manager, International Household Survey Network (IHSN) Addis Ababa, September 23, 2011
Microdata Management Toolkit • Two main components • Metadata Editor: a specialized software for documenting any kind of microdata (surveys, censuses, administrative records) • NAtional Data Archive (NADA): an open source application for cataloguing and dissemination • (CD-Builder for dissemination) • Compliant with the DDI/DCMI (XML) standards (Data Documentation Initiative and Dublin Core)
What are the DDI and DCMI? • XML metadata standards • Standard checklists of what you need to know about a study and its dataset (DDI), and about the related resources (DCMI) • DDI developed by academic data centers • Now used in most countries in the world, and by various software applications (e.g. DevInfo, CsPro) • Two versions of DDI: • Version 2.n (DDI codebook), used by the Toolkit • Version 3.n (DDI life cycle)
What is the DDI? An example “The National Statistics Office (NSO) of Popstan conducted the Multiple Indicators Cluster Survey (MICS) with the financial support of UNICEF. 5,000 households, representing the overall population of the country, were randomly selected to participate in the survey, following a two-stage stratified sampling methodology. 4,900 of these households provided information.” In XML this could look like this: <titl> Multiple Indicator Cluster Survey 2005 </titl> <altTitl> MICS 2005</altTitl> <AuthEnty> National Statistics Office (NSO) </AuthEnty> <fundAgabbr= "UNICEF">United Nations Children Fund </fundAg> <nation> Popstan</nation> <geogCover> National </geogCover> <sampProc> 5,000 households, stratified two stages </sampProc> <respRate> 98 percent </respRate>
Advantage of XML • Can be transformed into many kinds of outputs: • HTML • PDF • Databases • Others • Plain text files not specific to any operating system or application (“durable” metadata)
Development of the Toolkit • Metadata Editor • By Nesstar Ltd (“Nesstar Publisher”) with IHSN support • Now a freeware • Development benefited from many users’ feedback • Available at www.ihsn.org/toolkit • NADA, (CD-Builder) • By the World Bank / IHSN • Available at www.ihsn.org/nada
The Metadata Editor (demo) Skip demo
Display the list of variables (labels preserved), with summary statistics
Can edit variables and value labels; immediately shows missing labels.
Add variable-level metadata (question, interviewers’ instructions, definitions, derivation/imputation method, universe, etc)
Add “external resources” (= documentation and links to all related materials: questionnaires, manuals, reports, programs, etc.)
Data and metadata are saved in a single file, format “Nesstar”. The format Nesstar is NOT for dissemination! Data can be re-exported to more standard formats: SPSS, ASCII, Stata, etc. ASCII (with data dictionary) is the preferred format for long-term preservation. The DDI provides the data dictionary. The Metadata Editor is a tool for preparing and packaging your data and metadata, not a tool for dissemination !
Export metadata (no data) to XML format (i.e. generate the DDI and the DCMI files).
DDI file is a text file (XML) which looks like this. It contains all metadata, down to variable level ( it provides a detailed data dictionary). This DDI (+DCMI) file is ready to be “transformed”, e.g. by being published in a NADA catalog.
NAtional Data Archive – NADA (demo) Skip demo
Provides detailed metadata, all automatically taken from the DDI and transformed into HTML
Various options to disseminate microdata: no access / direct access / licensed files / enclave / external repository Link to other related sites/applications (e.g., REDATAM and/or CensusInfo)
Microdata dissemination policy should be published with NADA.
Upload files that you want to disseminate (data, questionnaires, reports, etc)
IHSN Toolkit – Benefits for NSO • Replicability, transparency • Visibility • Credibility • Institutional memory • Knowledge generation (if disseminate microdata) increase and demonstrate the value of data more funding • Satisfy a legal requirement in some countries • Participate in Open Data / Data Liberation movement
IHSN and other tools Reports, tables (PDF) Web development tool On-line tabulation (and analysis) tool REDATAM, SuperStar, Nesstar, Tableau, etc Indicators CensusInfo, DevInfo, etc Microdata (n% sample) IHSN Metadata Editor and NADA Metadata IHSN Metadata Editor and NADA Microdata, full, raw and edited versions IHSN Metadata Editor
Guidelines and practices • Guidelines for documenting a dataset using the IHSN Toolkit http://www.ihsn.org/home/index.php?q=tools/documentation
Guidelines and practices • Formulating an access policy and procedures http://www.ihsn.org/home/index.php?q=focus/dissemination-microdata-files-principles-procedures-and-practices
Guidelines and practices • Long term preservation of data and metadata • Based on OAIS “standard” • Complex; useful as a “technical audit manual” http://www.ihsn.org/home/index.php?q=tools/preservation
Guidelines and practices • Country experience: Statistics Canada’s Data Liberation Initiative (forthcoming) • Other IHSN manuals (being drafted): • Producing public use census sample files • Anonymizing microdata
Some recommendations • Countries • Comply with the DDI standard • Produce sample dataset (n%) for public (free) dissemination of microdata • Publish a formal microdata management and dissemination policy • Assess your preservation policy/procedures • Preserve all versions of your census data • International agencies • Develop a central census catalog (UNSD?) • Develop anonymization guidelines • Support the establishment of data archives
Questions? Need support? • Accelerated Data Program (PARIS21/WB) • Training, technical support to data archiving • Contacts: • Olivier Dupriez at the World Bank (odupriez@worldbank.org) • Francois Fonteneau at PARIS21 (francois.fonteneau@oecd.org)