Description of Session
Background: Our poster presents the implementation and results of several divergent approaches to health record linkage in Botswana’s National Data Warehouse (NDW). Botswana’s NDW integrates health records originating in several hundred health care facilities captured by a variety of mechanisms, including electronic transfer of current electronic records, scanning of historic paper registers followed by optical character recognition, and manual transcription of paper records. Botswana’s NDW provides an informative example of the range of linkage approaches and outcomes possible in such large, heterogeneous information systems; we use this as springboard for discussion of a path toward greater standardization of best practices for this complex task in health information system implementation. Methods: We compare record linkage methods including the classic and novel approaches in unsupervised and supervised approaches to entity matching. Methods include the original Fellegi-Sunter (FS) probabilistic method and extensions that relax some of the assumptions of FS. Within these classic methods, we explore the impact of variations on string similarity algorithms including Levenshtein and Jaro-Winkler. Additionally, we examine supervised linkage approaches, including the classic approach of Copas and Hilton, as well as recent developments in applying machine learning algorithms. Results: Each of the methods examined has a range of potential performance based on (1) the rigor applied to the implementation details and (2) the amount and appropriateness of pre-processing of the input data. We highlight the factors that have the greatest returns on performance, and note computational considerations. Conclusions: Health record linkage presents a considerable technical challenge, while few resources present a holistic view of motivating characteristics of information system sources, data pre-preprocessing options, and implementation details of linkage methods. We illustrate the performance of range of methods on Botswana’s diverse NDW to foster conversation on increasing shared understanding of best practices in health record linkage.