1 / 27

Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se

Swedish inventors  ‐  matching to registers and descriptive data Presentation at APE-INV Brussels September 5 th 2011. Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se.

alaura
Download Presentation

Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swedish inventors  ‐  matching to registers anddescriptive dataPresentation at APE-INVBrussels September 5th2011 Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se C I R C L ECentre for Innovation, Research and Competence in the Learning EconomyL U N D U N I V E R S I T YP.O.Box 117, SE-221 00 Lund, Sweden

  2. On the agenda • What is so special with Swedish data • 1st matching • 2nd matching • Future – how to reach 100% match rate? • (Results)

  3. Linkinginventors to registers • EPO applied patents 1978-2009 for inventors with addresses in Sweden. • Matchingdoneon name-homeaddress combinations • Problem 1: different inventors may have the same name • Problem 2: addressesmaybe old • How to verifyperson identity and connectto Swedish register data?

  4. Swedish data Q: What makes Swedish data so exciting (and why we want a high match rate)? A: Through Statistics Sweden it is possible to connect individuals to register data whichconnectsseverallevels of information relevant for innovation studies: • Individuallevel: field/level of education, age, income, gender, workplace • Regions: workplace, home municipality • Sectoral level: sectors, firm size, level of R&D...  can give a multifacetted view of innovation, but need a personal identifier ”personnummer” to do this e.g. 19500131-3422 Birth date Jan 31st, 1950 Evennumber = female

  5. 1st matching (Oct-Dec 2010) • All Swedes (incl. Personnummer) listed on address register ”SPAR” • Matching of addressesthroughInfoTorgstores addresses/addresschangeslatest3 years addition of personnummer • Individuals under 16 not matched • Old patents added under the assumption that: Sven Ivar Johanson Sven Ivar Johanson Storgatan 1 = Storgatan 1 111 00 Stockholm 111 00 Stockholm Match rate 64% of inventor-patent pairs. Lowpeak 23% in 1978 to high peak 93% in 2008. This is because of mobility of inventors. Register 2008-2010 Patent applied for in 1992

  6. InfoTorgreturned 56% match rate • Manual check (visual – no robot) + 8%

  7. 64% match rate 1985-2005: present access to individual registers at Statistics Sweden 2006-2009: additions as of Sep. 30th 2011

  8. 2nd matching (April-Sep 2011) • Use public access to registers (Swedish geneaological association ) • CD:s of Swedish population (1980)/1990 published by  oldaddresses and birth date • CD ”Book of dead” 1901-2009  address at death + personnummer • Match birth date + name to personnummer using service by InfoTorg or online sources

  9. Methodology • Extract data from Swedish deadbook and Swedish genealogy records for 1990 (to some extent also 1980) on all individuals in the population by letter • Generate a variable containing name, address and postal address for all individuals in the population as well as for inventors who are not fully matched

  10. Normalized Levenshtein (”strgroup”) in STATA • An example of the ”name-address”string: ”Sven Ivar Johanson, Storgatan 1, 111 00 Stockholm” (from EPO) = ”Sven Ifwar Johanson, Storgatan 1, 111 00 Stockholm” (from Swedish population 1990) • Replace/insert 3 letters to make strings equal • Divided by length of shortest string (48)  (3/48) = 0.0625 (=a good hit)

  11. Adding date of birth • 1990 Levensthein names & adresses • 1990 Levensthein unique names • Levenshtein from CD dead 1901-2009 - names and adresses • Strgroup: similarity on name-address hits 1-3 • Some manual additions and minor changes • 1980 Levenshtein names and addresses (letters D&H)

  12. Methodology: continued • Manually examine each match to see whether Levenshtein-command has matchedcorrectly • Some hits discardedinclambiguousname match hits

  13. New match rate 80%

  14. Adding personnummer (ongoing) New match rate 80%, but not full personnummer. What to do? • Use date of birth-part of personal number for fully matched inventors • Join all possible combinations of birth dates for those fully matched and those with only birth dates. • RunLevenshtein-distance on inventornames • Small Levenshtein-distance: accept that the inventors are the same since name and birth date match • Large Levenshtein-distance: reject • Further, manually check remaininginventors. Look at addresses for further confirmation if uncertain.

  15. Adding personnummer ctd. • UseDeathbook yrs 1975-2009. Use date of birth-part of personal numbers • Re-runstep 2-6 on previous slide

  16. Adding personnummer ctd. Problem: not all inventors were previouslyidentified no 4 last digits Two options to get full personal numbers from birth dates: • Use InfoTorg again with name + addedparameter ”birthdate” • Manually addfourlast digits by using internet service (www.upplysning.se)

  17. Somematching problems • Difficult to match individuals who change last names (mainly women) or with common names and who move a lot. • Two people with the same name can live on the same address (i.e. father names his son after himself) – possibility to match the wrong person. If detected, oldest person is chosen. • For inventors affiliated with somefirms (AstraZeneca), companyaddress given

  18. Towards 100% • Idea: scoringmethodsbased on identifiedinventors • Name • Identifiedco-inventors • Technology class • City • Postal code • Whichalgorithm? • Statistics Sweden for validatingparent/childnamesimilarity problem? • Use 1980 population CD? • Strategy of focusing on highlyproductiveunmatchedinventors?

  19. Suggestions/questions

  20. Patent distribution by sector

  21. Patent distribution in manufacturing (share of total patenting)

  22. Patent distribution in services (share of total patenting).

  23. Education level among inventors

  24. Percentile distribution of inventors’ patent productivity.

  25. Sectors, SNI92-codes, # inventors, contribution 2004-2005. * ”Contribution” counts patent fractions which adjusts for co-inventorship. ** ”Academia” can also in a few cases be found in the sectors R&D in technical and natural sciences (73101-73104) and in technical testing and analysis (74300).

  26. Cooperation by sector, 2004-05

  27. The most important patenting academic institutions 2004-2005

More Related