Bayesian Inference for Gaussian Semiparametric Multilevel Models
Posted: 3 May 2016
Bayesian inference for complex hierarchical models with smoothing splines is typically intractable,requiring approximate inference methods for use in practice. Markov Chain Monte Carlo (MCMC) is the standard method for generating samples from the posterior distribution. However, for large or complex models, MCMC can be computationally intensive, or even infeasible. Mean Field Variational Bayes (MFVB) is a fast deterministic alternative to MCMC. It provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior. Unlike MCMC, MFVB efficiently scales to arbitrarily large and complex models. We derive MFVB algorithms for Gaussian semiparametric multilevel modelsand implement them in SAS/IML® software. To improve speed and memory efficiency, we use block decomposition to streamline the estimation of the large sparse covariance matrix. Through a series of simulations and real data examples, we demonstrate that the inference obtained from MFVB is comparable to that of PROC MCMC. We also provide practical demonstrations of how to estimateadditional posterior quantities of interest from MFVB either directly or via Monte Carlo simulation.
The growing emergence of machine learning and data mining tools has helped researchers capture and understand patterns from large and complex datasets that are typically of grouped or hierarchical structure. The most common data types are longitudinal and multilevel data, which are frequently seen in many applied areas such as education, epidemiology, medicine, population health and social science. These data structures give rise to correlations among observations within groups/clusters and therefore require sophisticated statistical models that take into account such correlations during data analysis.
Linear mixed models extend standard linear models by adding normal random effects on the linear predictor scale to account for correlated observations within groups. However, this flexibility is traded with analytical tractability and may suffer from computational complexity and decreased efficiency. In the Bayesian paradigm, estimation of mixed models via Markov chain Monte Carlo (MCMC) techniques is challenging since the integral over the random effects is intractable. In this paper we use the SAS® Interactive Matrix Language (IML) environment to implement Mean Field Variational Bayes for Bayesian Gaussian semiparametric multilevel models.
The IML environment while different in behavior to standard SAS procedures and the SAS Data Step, is well suited to coding new computational procedures in SAS from start to finish. This is because the IML environment operates in terms of vectors and matrices, and while this will be familiar to many users of other analytical software such as R or Matlab, the IML language is intuitive to use. Results of computation in IML can be easily output to SAS Datasets for use in existing analytic or graphics procedures, and so can be seamlessly integrated into analyses in SAS.
This paper will introduce SAS users to a fast deterministic alternative to MCMC which provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior of a Bayesian Gaussian semiparametric multilevel model, using the IML environment. Simulation comparisons will focus mainly on random intercept models with a single covariate that requires smoothing using penalized splines and a few typical covariates (continuous and binary). Some knowledge of using splines for smoothing non-linear associations between an outcome and a continuous covariate, particularly the random-effects formulation (penalized splines) is advantageous. Further, an understanding of the covariance structures that arise in multilevel models, Bayesian inference using MCMC, creating expanded design matrices when using categorical covariates, and the role of centering and standardizing data prior to an analysis will assist readers in getting the most out of this paper.