1 / 18

Identity in the Census

Identity in the Census. Finding people in more than one. What is Identity?. A unique set of identifiers. What is an Identifier?. Any measurable attribute In Census - Name, Age, Sex, Birth State AND Household characteristics. Basic Record Linking. Generalize identifiers to block

Patman
Download Presentation

Identity in the Census

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identity in the Census Finding people in more than one

  2. What is Identity? • A unique set of identifiers

  3. What is an Identifier? • Any measurable attribute • In Census - Name, Age, Sex, Birth State • AND Household characteristics

  4. Basic Record Linking • Generalize identifiers to block • Compare within a block more specifically to match Why? • GEDCOM was about exchange – we’ve abandoned that in favor of linkage • Local conclusions, remote evidence

  5. Household Characteristics? • Oldest male • Oldest female • Oldest boy • Oldest girl

  6. Types of Identifiers • Cultural • Biological

  7. Cultural Identifiers • Surname • Given Name • Family Role

  8. Biological Identifiers • Sex • Age • Parent / Child roles

  9. Coding Identifiers • Soundex • Initials • Birth year

  10. Why code identifiers? • Because matching doesn’t work • Expressions of identifiers in records vary – granularity etc. • To speed up comparisons by allowing blocking on a matched code

  11. Examples: Carroll Co AR • 1860 and 1870 • Surnames beginning with K and L

  12. What kind of keys did you use? • qry1860OldWoman • Sex (f) • Initial of Surname • Initial of coded first name • Estimated birth year / 5 • Example: 1860 Mary Keelan age 13 • fKM369

  13. 1860 Family • John • KEYES • 30 • Hannah • KEYES • 27 • Housekey = mKJ366fKH366 • Less granular key = mKJfKH

  14. Easy Match • Surname Soundex • First initial • Birth Year • Easy List

  15. Other Matches • Universe 778 records • Key 2 – mKJ - 449 matches • EasyList – 78 matches • Key 3 – mKJ382 – 38 matches • Mom and oldest boy – key3 – 8 matches – 1 right • Housekey - mLA368fLE368 – 4 matches – 3 right

  16. Work to do • Measure the effectiveness of different sets of identifiers • Scale the algorithms to larger data sets • Abandon linking for a Cartesian Event Space

  17. Example of LeMaster Numbers

  18. Questions

More Related