230 likes | 394 Views
caTIES Custom DeID Installation TBPT F2F Meeting Thursday May 17, 2007. Rebecca Crowley, Girish Chavan and Kevin Mitchell crowelyrs@upmc.edu chavang@upmc.edu mitchellkj@upmc.edu. Presentation Overview. caTIES Architecture Plugging in alternative De-Identifiers MGH Scrubber Analysis
E N D
caTIES Custom DeID InstallationTBPT F2F MeetingThursday May 17, 2007 Rebecca Crowley, Girish Chavan and Kevin Mitchell crowelyrs@upmc.edu chavang@upmc.edu mitchellkj@upmc.edu
Presentation Overview • caTIES Architecture • Plugging in alternative De-Identifiers • MGH Scrubber Analysis • Case Study DeID comparison • caTIES Ongoing Activities
CaTIES Architecture Assumptions • Large datasets up to 10 million reports per organization • Evolving and errant transformation engines require precision rerun • Localized behavioral modifications at many levels • Honest broker visualization of identified information • CaTIES innate linkage between public and private data using surrogate keys
Installing Custom De-Identifiers • Use CaTIES Installer For MGH Scrubber • Installs version 1.0 of HMS Pathology Report SCRUBBER • Deploy any custom De-Identifier • Install any DeID to create a Windows Services foot print • Implement CaTIES_DeIdentifier interface • Plug-In to Services foot print
Sleep duration If no reports are found for processing, the time after which it should check again for new reports. This interval is specified in milliseconds. Deidentifier caTIES comes pre-configured to used the DeID Corp Deidentifier or a free open source de-identifier developed at Harvard. You can also select the ‘No Deidentifying’ option to only synthesize the reports. De-ID installation location Install path for De-ID software. De-ID is a pre-requisite software required for this component to work. CaTIES Installer De-Identifier Parameters
Tanuki Wrapper Windows Services http://wrapper.tanukisoftware.org/doc/english/introduction.html
CaTIES_DeIdentifier Interface package edu.upmc.opi.caBIG.caTIES.server.deid; publicinterface CaTIES_DeIdentifier { publicvoid initializeDeIdentification(); public String deIdentify(String text) ; publicvoid finalizeDeIdentification() ; public String getApplicationName() ; public String getApplicationRevision() ; }
Custom DeIdentifier Install Steps • Implement CaTIES_DeIdentifier • Accepts document text as java.lang.String returns redacted text as String • Jar implementation class and dependancies – (e.g., MyDeID.jar) • Place MyDeID.jar in lib directory • Place additional config files in classes dir • In classes/CaTIES.properties set caties.deid.classname = my.package.MyDeID
MGH Scrubber Trial Installation & Analysis • Uses three key configuration files • Scrubber.xml locates • Locates other configs • Flexible redaction behavior (overwrite with spaces or patterns like ***Name*** • Names.list • Proper names should be localized geographically • Regex.list • Cascading regular expressions
MGH names.list snippet Niver Niverson Niverville Nives Nivison Niwa Niwot Nix Nixa Nixion Nixon Nixson Niziol Niziolek Niznik Nizo Njango Njie Njoku Nkomo Nkuku No Noa Noack Noah Noakes Noaks Noank Noatak Nobbe Nobel Nobile Nobis Noble Nobles Noblesville Noblet Nobleton Noblett
MGH Scrubber regex.list snippets //Patient's Name PATIENT_NAME=[^a-z^A-Z^0-9]+(((patient(')?(s)?)\s+(name))[^a-z^A-Z^0-9^\"]+(is)?\"\w+[^a-z^A-Z]+\w+[^a-z^A-Z]+) //Take out any telephone numbers TELEPHONE=[^a-z^A-Z^0-9]+\(?[1-9][0-9]{2}[\)\s-]?[1-9][0-9]{2}[\s-]?[0-9]{4}\s+ //put the older Dr regex. this should be a safety net incase the above regex is not satisfied. DOCTOR_OLDER=[^a-z^A-Z^0-9]+(([dD][Rr]([Ss])?)|([Dd]octor)s?|(DOCTORS?))[^a-zA-Z^0-9]+(\w+[^a-z^A-Z^0-9]+){1,3}
MGH Scrubber Features • Scans based on names and then successively applies regular expression matching • Each regular expression works on the output of the previous regular expression producing a pipe effect
MGH Scrubber Provenance • MGH Scrubber provides a well-designed model of batch run provenance where each result set is saved with the information used to redact it (I.e., names, regex) • This is not carried over to the caTIES plugin. Instead only a version id is available in caTIES
MGH Scrubber XML target CHIRPS • MGH Scrubber also support DeIdentification of CHIRPS XML • Although CaTIES maintains CHIRPS XML submission schema for all reports this features is not used in the integration context
Test Set Comparison • Two demonstration archives representing demo organization A and B • 20K Patients in each • 5 Cases per patient = 100K reports • Common Cancer Organ Sites, Procedure, Disease, Findings • Names, Addresses, Phones pulled randomly from New Orleans White Pages • All aspects of identified information randomized and permuted • A few sentence templates instantiated in the Gross Description set
Test Set Randomization • Use LexBIG API to NCI Metathesaurus and NCI Thesaurus to generate mock Final Diagnosis sections • Randomly Negated some Diseases and Findings with No evidence of • Use programmer derived sentence patterns to generate some sentences in need of DeIdentification. Arbitrarily place these in the Gross Description section
Example Test Report FD Section [Clinical History] This section has been ommitted. [Final Diagnosis] This sentence should not be redacted. Lung, Open Lung Biopsy Pulmonary edema NOS (disorder). No evidence of Acute bronchitis (disorder). No evidence of Occult Non-Small Cell Lung Carcinoma. Lung Carcinoma Metastatic to the Liver. CYP2C9, R150H. This sentence should not be redacted.
Example Test Report GD Section [Gross Description] This sentence should not be redacted. Surgeon: Slattery. All consult information received from Memorial Medical Center . Accessioned into the lab as eed42e22-ffc3-11db-affe-f592c4a4c795. The patient resides at 4499 Jean Lafitte Blvd and can be reached during the day at 954657345. Recieved frozen sections from Louisian Department of Health and Hospitals lab on Tue Sep 10 12:24:12 EDT 1968. This sentence should not be redacted.
CaTIES Ongoing Activities • Work with Persistant for Consolidated Tissue System • UPMC Intranet Community
References • http://spin.nci.nih.gov/