Introduction: The Privacy Rule of Health Insurance Portability and Accountability Act requires that clinical documents be stripped of personally identifying information before they can be released to researchers and others. We have been developing a software application, NLM Scrubber, to de-identify narrative clinical reports.
Methods: We compared NLM Scrubber with MIT's and MITRE's de-identification systems on 3,093 clinical reports about 1,636 patients. The performance of each system was analyzed on address, date, and alphanumeric identifier recognition separately. Their overall performance on de-identification and on conservation of the remaining clinical text was analyzed as well.
Results: NLM Scrubber's sensitivity on de-identifying these identifiers was 99%. It's specificity on conserving the text with no personal identifiers was 99% as well.
Conclusion: The current version of the system recognizes and redacts patient names, alphanumeric identifiers, addresses and dates. We plan to make the system available prior to the AMIA Annual Symposium in 2014.