Ryan Peterson, PhD

Statistical modeling, both in our data and our discourse, involves navigating a landscape of mixed signals. In modern “big” data sets, predictors can vary widely in their informativeness, reliability, and interpretability; meanwhile, quantitative experts often disagree on how much complexity, automation, or opacity we should tolerate in our models. This talk examines areas of ongoing debate in model selection and statistical analysis, including tensions between black-box and glass-box models, exploratory versus confirmatory modeling, and pragmatic (ad hoc) versus rigorous methods (is there such a thing as a statistical emergency?).

Amid these tensions, I will argue that analytic tools must be capable of more than accurate prediction and valid inference. Though these are noble goals, statisticians must ensure they do not prevent researchers from clearly seeing meaningful patterns and acting appropriately. Drawing on real-world examples from collaborative research, I will highlight how transparent, interpretable tools, including (1) the sparsity-ranked lasso (SRL), (2) an interactive visualization platform for real-time model development (VisX), and (3) median aggregation of penalized coefficients after multiple imputation (MALCoM), can improve scientific communication and reproducibility. For instance, under heterogeneous (or “mixed”) signals from diverse feature sets (such as main effects and interactions), the SRL facilitates glass-box model selection by adaptively prioritizing the most meaningful signals (e.g., main effects), while often predicting as well as black-box approaches.

In the era of big data and team science, glass-box tools like these are not only feasible but essential: they empower statisticians to make principled, interpretable, and actionable analytic choices.