Muxuan Liang, PhD

The recent development of Artificial Intelligence (AI) has enabled us to extract broad and deep phenotypes from large databases, such as electronic health records (EHRs). However, the extracted outcomes of interest may not be accurate; therefore, a chart review procedure is often employed to validate the reliability of AI tools. Misclassification of these outcomes presents a common challenge in AI-powered medical studies, particularly concerning classification or decision-making problems. Directly using the extracted outcomes may lead to a biased estimation of the optimal rule. The sensitivity and specificity of AI tools at the individual level can be used to eliminate such bias. However, due to the limited sample size in chart reviews, pinpointing individual-level sensitivity and specificity can be challenging or even impossible. In this work, for classification problems, we assume a range of individual-level sensitivity and specificity to quantify the reliability of AI tools. With this partial information on sensitivity and specificity, we establish partial identification for the distribution of the underlying outcome. Based on this result, we propose a robust classification framework and a novel estimation procedure to derive a robust classification rule without requiring precise identification of individual-level sensitivity and specificity. In our theoretical analysis, we demonstrate a faster convergence rate in generalization error compared to traditional procedures, and our method can achieve performance as if all patients underwent chart review, even if the sample size in the chart review is much smaller than the total sample size.