1 / 40

Census Microdata Revolution: Archiving, Integrating, and Disseminating Official Statistics

Explore the power of preserving, integrating, and disseminating census microdata for researchers worldwide. Learn about the strengths, challenges, and golden rules of this revolution.

johnmjones
Download Presentation

Census Microdata Revolution: Archiving, Integrating, and Disseminating Official Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Roundtable on Archiving and Disseminatingofficial statistics with a focus on census microdataExample: IPUMS-Internationalhttp://www.ipums.org* * *Robert McCaa, Professor of Population Historyand Wendy L. Thomas, Archivist,University of Minnesota Population Centerrmccaa@umn.eduThis .ppt, docs, &additional information at:www.hist.umn.edu/~rmccaa/ipums-africa

  2. Our common fate on a crowded planet: new forms of global cooperation are required.We must engage interdisciplinary research combining theory and practice.--Jeffrey D. Sachs, Common Wealth (Penguin 2008) Imagine!!! a microdata revolution!

  3. A Census Microdata Revolution • Preserve all microdata and documentation 20 slides Product (tables and microdata) Process (of conducting census and producing census microdata) • Integrate microdata and metadata 8 • Disseminate to researchers world-wide 3 Conclusion: strengths, challenges, 7 golden rules 4

  4. A Census Microdata Revolution • Preserve all census microdata and documentationproduct and process: • 1960s – present • ~100 countries (80 have endorsed IPUMS MoU) • ~400 censuses (219 are entrusted to IPUMS) • Integrate: both microdata and metadata • Disseminate to researchers world-wide— “extracts” of database: countries, censuses, sub-populations, sample size, variables

  5. IPUMS-International Today dark green = already integrated:35 countries, 111 censuses, 263 million person recordsgreen = to be integrated: 39 countries, 103 censuses, 150 mill. Mollweide projection

  6. IPUMS dissemination calendar (see handout)samples for 35 countries available now, 74 soon • Europe 10:4 • Available (10): Austria, Belarus, France, Greece, Hungary, Netherlands, Portugal, Romania, Spain, UK • Soon (4): Germany, Czech Republic, Slovenia, Switzerland • Americas (funding renewed July 1) 11:11 • Available (11): Argentina, Brazil, Canada, Chile, Colombia, Costa Rica, Ecuador, Mexico, Panama, USA, Venezuela • Soon (11): Bolivia, Cuba, Dominican Republic, El Salvador, Guatemala, Honduras, Nicaragua, Paraguay, Peru, Puerto Rico, Uruguay • Africa 6:11 • Available (6): Egypt, Ghana, Kenya, Rwanda, South Africa, Uganda • Soon (11): Botswana, Ethiopia, Guinea (Conakry), Madagascar, Malawi, Mali, Mauritius, Sierra Leone, Sudan, Tanzania, Zambia • Asia 8:13 • Available (8): Cambodia, China, Iraq, Israel, Malaysia, Palestine, Philippines, Vietnam • Soon (13): Armenia, Bangladesh, Fiji, India, Indonesia, Jordan, Kyrgyz Republic, Mongolia, Nepal, Pakistan, Thailand, Turkmenistan

  7. IPUMS timeline • 1995: IPUMS-USA first release of integrated microdata IPUMS-USA continues: 1850-2000 + ACS samples • 1999: IPUMS-International funded • 2002 - 1st International release: 7 countries, including Colombia and Mexico • 2006: 20 countries, 63 censuses • 2008: 35 countries, 111 censuses • ~263 million person records • Two thousand users • 2013: ~70 countries, ~200 censuses • 214 sets of microdata are already entrusted to MPC • Coming: Germany (8), Switzerland (4), Bangladesh (2), Cuba (1)...

  8. 1. Preserve (Archive)IPUMS Global workshop, ISI (Lisbon, Aug 2007)

  9. Microdata: Archiving & Disseminating • The producer’s perspective (official statisticians): • Archiving: • Comprehensive preservation of both data and documentation (metadata) with easily searchable indices • Continually updated with technological innovation—hardware, software (doc, pdf, txt, xls, jpg, etc.) and wet-ware • Disseminating: the web revolution • The consumer’s perspective (researchers) • Access: locate and use on the web without obstacles • Disseminating: free access to anyone, anywhere, anytime (access postponed is access denied) • What are your interests?

  10. Microdata: Archiving & Disseminating Our perspective: • “Archiving Census Microdata and Documentation: Preserving Memory, Increasing Stakeholders” (UNSD NYC, 2001) – copy of paper at ~rmccaa/ipums-africa • Long term, 7 keys: readable, intelligible, identifiable, encapsulated, understandable, reconstructable, authentic • What to preserve: the product and the process • How to assess future value: stakeholders, future impact, anticipated use, informing the future • Challenges: archive, plan, trained staff, external repository

  11. Preservation, the problem: 1973 census tapes of Sudan were at risk!

  12. A Solution: Data recovery (by a specialized data recovery company)

  13. Microdataon this tape were recovered!! Data recovery. Example: Bangladesh Bureau of Statistics--1981 census, 276 tapes, recovery in Aug. ‘08) >3,000 tapes recovered: 1971 Germany1980 Mexico, Mali 76, Sudan 73and many more

  14. Census Microdata: 1950sfew countries archived microdata (a country in green indicates microdata exist for the decade)see: www.hist.umn.edu/~rmccaa/IUMSI/country6.htm Mollweide projection

  15. Census Microdata: 1960sThe Americas: in the vanguard for preservation of microdata Mollweide projection

  16. Census Microdata: 1970sthe preservation of microdata was almost universal in the Americasand was becoming widespread in Europe, Africa and Asia Mali, 1976: census microdata recovered from old Bernoulli boxes Mollweide projection

  17. Census Microdata: 1980sThe preservation of microdata became generalized Ghana, 1984: census microdata recovered from floppy discs! Mollweide projection

  18. Census Microdata: 1990smany countries preserved microdata(or are disposed to recover them) Mollweide projection

  19. Census Microdata: 2000smany countries have microdata(or are disposed to make them available for research) Mollweide projection

  20. Inventory of census microdata archived by region and decade (% of censuses conducted) • Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htm

  21. 7 Essential Types of Metadata for Each CensusSee IPUMS Documentation (“Table 1”) • Census Questionnaires (forms): dwellings, households, persons, mortality, migration, etc. • Enumerator instructions • Data Dictionaries (layouts) • Codebooks • Geographic codes • Occupation / Industry / Education codes • Data processing protocols • Official Statistics • Official Reports (Analytical, Technical, Methdological)

  22. 7 Essential Types of Metadata for Each CensusExample: Ghana www.hist.umn.edu/~rmccaa/ipums-africa

  23. 7 Essential Types of Metadata for Each CensusExample: Guinea (Conakry)www.hist.umn.edu/~rmccaa/ipums-africa

  24. 2. Integration: Microdata and Metadata

  25. IPUMS integration of metadata and microdata • Comprehensive documentation, including • Data dictionaries and codebooks • Complete original source documentation in the official language: questionnaires, manuals, etc. • All translated to English (from the German--thanks again to Martin Podehl!!) and converted into metadatabase for each census • Integration ≠ standardization • Composite codes (11, 12, 21, 22…) ≠ serial codes (1, 2, 3, …) (see next slide)

  26. IPUMS—Microdata integration method: composite codes (multiple digits)retains not only significant distinctions but also integrates comparable concepts

  27. IPUMS—Microdata integration method: composite codes (multiple digits)retains not only significant distinctions but also integrates comparable concepts Goal of integration coding scheme: Assist each researcher in making informed decisions on comparability—not to attempt to make the one best decision for all researchers.

  28. Metadata: Employment Status EMPSTATEmployment status DescriptionEMPSTAT indicates whether or not the respondent was part of the labor force -- working or seeking work -- over a specified period of time. Depending on the sample, EMPSTAT can also convey further information.The first digit of EMPSTAT is fully comparable, and classifies the population into three groups: employed, unemployed, and inactive. The combination of employed and unemployed yields the total labor force. The second and third digits of EMPSTAT preserve additional information available for some countries and census years but not for others.Employment status is sometimes referred to in other sources as "activity status."Comparability -- GeneralThe age of persons to whom the question applies varies across the samples (see Universe). The reference period for the employment status question varies. For most samples, employment status was reported with respect to the day of the census or…

  29. Integrate: retain all significant detail, harmonize everything Not standardize: force square pegs in round holes Metadata: Employment Status, example: Mexico Comparability -- MexicoThe universe and reference period are fully comparable across the Mexico samples. The 1970 Census did not provide detail on the inactive population except for "houseworkers," while the later samples have numerous subcategories.In 1990, the employment status question refers to "Principal Activity" and therefore under-reports secondary economic activity by students, housewives, family-workers, the semi-retired, and others.The 2000 Census sought to overcome deficiencies in reporting work status for people whose primary activity was not work (students, housewives, retirees, etc.), but who in fact were working according to international definitions. A second question introduced for the first time in 2000 sought to capture this secondary economic activity. For strict comparability with earlier Mexican censuses, this recovered activity (codes 1101-1106) should be considered "inactive."…

  30. IPUMS integrated metadata: Instantly, compare text &/or image of enumeration forms and instructions for any combination of countries and censuses (example: educational attainment)

  31. In addition… • Microdata: new high precision samples not only for contemporary censuses but also for historical ones (before the 90s) • Systematic metadata for all variables • Universes • Definitions • Comparability • Dynamic System—facilitates comparing the wording of questionnaires and instructions for any combination of countries and censuses

  32. 3. Dissemination

  33. - Caution - • IPUMS microdata are anonymized samples. • They are for advanced analysis and research. • Use of a statistical software is required. • Statistical software provides great power. • “With great power, comes great responsibility.” • IPUMS samples are for analysis. • IPUMS samples are not official statistics.

  34. 2a. Study documentation2b. Design extract 3. Receive email; logon with p/word 1. Logon w/ password (also SAS, STATA) 4. Download extract (SSL encrypted) 5. UnZip data 6. Analyze 6 stepsusinghttps://international.ipums.org/international:

  35. Conclusion: IPUMS Strengths and Challenges plus 7 golden rules for promoting microdata revolution

  36. The IPUMS team (Feb. 2008) Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center (Not present: computer gurus, some researchers, and others who were too busy for a photo!)

  37. IPUMS-International strengths • Uniform legal authorization with national statistical authorities • Access restricted to academics with need who agree to abide by stringent confidentiality protections • Sanctions against individual and institution—denial of access to all microdata for the entire institution • Experienced integration teams • Proven web-based distribution system • High user satisfaction with microdata & metadata • Sustainable funding: NSF, NIH

  38. 5 Challenges • Microdata to recover (30 countries), integrate (60 countries) • 2010 round of censuses (~100 countries) • Tabulator (research tool—not official stats) • GIS • High security laboratory for sensitive, comprehensive microdata

  39. 7 golden rules for the global microdata revolution • Respect “restricted-access” conditions of use: • protect confidentiality • “share” data only with registered users • Study both source documentation and metadata: • Original source: census forms, instructions to enumerators, etc. • Integrated metadata: samples, variables, comparability discussions • Construct extracts judiciously: • extract only needed countries, censuses, variables, sub-pops • use sample size &/or “subsamp” features to keep samples small • Use weights:either households or individuals (geographical strata = power) • Analyze carefully:proper statistical techniques, keeping in mind data quality, sample error • Cite properly: IPUMS and National Statistical Agencies • Share publications: IPUMSand National Statistical Agencies

  40. Thank you!!rmccaa@umn.edu

More Related