Qi Long, PhD

Electronic health records (EHR) data, collected as part of healthcare delivery, include static variables such as patient demographic, structured data such as labs and vitals, codified data such as diagnosis codes, and unstructured data such as doctor notes and pathology reports. They are typically recorded with irregular time intervals. In this talk, I will share my research group’s recent works on developing robust statistical and machine learning methods for complex EHR data towards the goal of advancing intelligent, equitable health, and discuss daunting challenges and exciting opportunities in this space. In particular, my group is the first to define three levels of EHRs data where level 0 data are the raw data that are recorded in EHR whereas levels 1 and 2 data require considerable to substantial amount of work on data curation and wrangling. We have developed methods for both levels 1 and 2 EHR data which require different strategies. Our experience has demonstrated that building on the core principles of statistical thinking a trans-disciplinary health data science approach offers great promises to accelerate innovation and harness the full power of EHR data to address complex, real world problems in medicine.