Brian J. Smith, PhD

We live in an era where data are ubiquitous and there is ever increasing interest in making predictions about future events based on past data.  Such is the focus of predictive modeling – a subject that overlaps with the fields of statistics, machine learning, artificial intelligence, data science, and others.  The practice of predictive modeling is reliant on software for the fitting, evaluation, and application of models.  A large number of predictive modeling techniques are available as R software packages.  However, interface and feature differences across packages can make their collective use challenging.  In this talk, I will discuss the new R MachineShop package for statistical and predictive modeling.  MachineShop aims to unify techniques from different packages by providing a common interface for model fitting, prediction, performance assessment, and presentation of results.  Support is currently provided for 51 models from 26 R packages, including traditional regression, regularization methods, tree-based methods, support vector machines, neural networks, and ensembles, as well as for data preprocessing, filtering, and model tuning and selection.  Model predictive performance can be quantified with a range of performance metrics and estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling.  Analysis results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.  This talk will provide an accessible introduction to the package, followed by a demonstration of its easy-to-use, yet powerful paradigm for model tuning and selection.

Brian J. Smith, PhD