Aixin Tan, PhD

Heterogeneity occurs in many regression problems, where members from different latent subgroups respond differently to the covariates of interest (e.g., treatments) even after adjusting for other covariates. Our work adopts a Bayesian model called the mixture of finite mixtures (MFM) to identify these subgroups. A key feature of this model is that the number of subgroups needs not be known a priori, and is modeled as a random variable. This Bayesian model was not commonly used in earlier applications largely due to computational difficulties. In comparison, an alternative infinite mixture model called the Dirichlet Process Mixture Model (DPMM) has been a main Bayesian tool for clustering even though it is a mis-specified model for many applications. The popularity of DPMM is partly due to its nice mathematical properties that allow efficient computing algorithms.

We develop Markov chain Monte Carlo algorithms for a subclass of MFM in regression setups that closely resembles that for DPMM, extending the results in Miller and Harrison (2017). We demonstrate the many benefits of using the MFM in comparison to other Bayesian and Frequentist methods with simple examples. We will also discuss remaining issues and future directions of this on-going project. This is joint work with Yunju Im.

Aixin Tan, PhD