150 likes | 277 Views
Domestic Violence Network People Matching. October 22, 2014 Jay Colbert, Indianapolis. Domestic Violence Network. Phase1: Developed “Domestic Violence in the Criminal Justice System” report in 2013-2014 Phase 2: Updating report
E N D
Domestic Violence NetworkPeople Matching October 22, 2014 Jay Colbert, Indianapolis
Domestic Violence Network • Phase1: Developed “Domestic Violence in the Criminal Justice System” report in 2013-2014 • Phase 2: Updating report • There are many different sources of DV data and the question was “how many unique people are involved?”
Data Sources • Notes • 1Domestic Violence Shelter • 2No names given • 3Special IMPD domestic violence program
Phase 1 people matching • In-house solution using (mainly) python scripts. • Looked at combination of name, race, gender, age. • Gave pretty good final results, but… • It took 10 days to run.
Phase 2 people matching • Researched multiple commercial software solutions geared towards data deduplication. • Some enterprise solutions as much as $30,000 and up. • Some desktop solutions as cheap as $500.
Phase 2 name matching • Purchased MatchIT software. • $4,000 initial, $2,000 annual renewal • Discounted since we are a nonprofit • Flexibility on matching algorithms. • We match on name and date of birth first. • Then we move on to matching on name, race, gender, and year of birth (not all sources give us exact DOB) • We previously used age, but that becomes problematic with increasing number of years of data.
Example • Names have been changed to protect the innocent (and the guilty). • These records identified as same person.
Old Method • Old Method had a hard time seeing people with different birthdates within a few years of one another as different people.
New Method • New Method much better since it looks at exact dates of birth
New Method • Even gets it right when both names are accidently in one field or there is a typo in the birthdate
Many other ways to match • Name • Date of Birth/Age • Race, Gender • Address • Phone numbers • Anything else you can think of
What did it boil down to? • The total 871,681 records boiled down to 400,736 • The 177,545 Marion County DV people boiled down to 92,908.
Example Output Unique Victims