Lan Luo, PhD

New data collection and storage technologies have given rise to a new field of streaming data analytics, including real-time statistical methodology for online data analyses. Streaming data refers to high-throughput recordings with large volumes of observations gathered sequentially and perpetually over time. Such data collection scheme is pervasive not only in biomedical sciences such as mobile health, but also in other fields such as IT, finance, service and operations etc. Despite a large amount of work in the field of online learning, very few focus on statistical inference and most of them are based on homogeneity assumption. In the first half of this talk, I will introduce a real-time regression analysis method, termed as “renewable estimation”, in the context of cross-sectional data streams, with a particular objective of addressing challenges in streaming data storage and computational efficiency. In the second half, I will discuss an extension to data streams that involve both inter-data batch correlation and dynamic heterogeneity. The key technical novelty pertains to the fact that the proposed method uses current data and summary statistics of historical data. The proposed algorithms will be demonstrated in generalized linear models (GLM) for cross-sectional data, and in state space mixed models (SSMM) for correlated data. I will provide both conceptual understanding and theoretical guarantees of the proposed method, and illustrate its performance via numerical examples.

Lan Luo, PhD