Wodan Ling, PhD

Emerging large-scale microbiome-profiling studies introduce new opportunities as well as challenges. One challenge inherent to the large sample sizes is the batch effect, which arises from differential processing of specimens and can lead to spurious findings. Most existing strategies for mitigating batch effect rely on approaches designed for genomic analysis, failing to address the zero-inflated and over-dispersed microbiome data. Strategies tailored for microbiome data are restricted to association testing, failing to allow other analytic goals such as visualization. In this talk, we present the Conditional Quantile Regression (ConQuR) approach, the first robust and comprehensive method that accommodates the complex distributions of microbial read counts, and generates batch-removed zero-inflated read counts that can benefit all usual subsequent analyses. We demonstrate its state-of-the-art performance in removing the batch effect of microbiome data while preserving the signals of interest. Another challenge is the reliable biological implication of individual taxa. Classical tests often do not accommodate the realities of microbiome data, leading to power loss. Approaches tailored for microbiome data often have inflated false positive rates, generally due to unsatisfied distributional assumptions. Most extant approaches also fail in the presence of heterogeneous effects. In this talk, we present the zero-inflated quantile (ZINQ) approach, which is robust to complex distributions of microbiome data and improves testing power by summarizing signals over different quantiles of a taxon’s abundance, facilitating detection of heterogeneous effects. We show that ZINQ often has equivalent or higher power compared to existing tests even as it offers better control of false positives.