Application in R using blavaan package
Department of Psychology and Child Development, CSU Stanislaus
in R
.SEM is in reality a family of different possible models.
The roots of SEM are grounded in the notion of correlation and regression models.
SEM is mostly used to estimate latent variables.
There are several definitions:
Bollen & Hoyle (2012):
Have you thought how to measure intelligence?
What about the concept of “good performance”?
What does good health look like ? Is it a concept?
Henseler (2020):
\(\Sigma\) = The estimated covariance matrix.
\(\Psi\) = Matrix with covariance between latent factors.
\(\Lambda\) = Matrix with factor loadings.
\(\Theta\) = Matrix with unique observed variances.
In SEM we also have another model for the implied means:
\[\begin{equation*} p(\theta|y) = \frac{p(\theta,y)}{p(y)} = \frac{p(\theta)p(y|\theta)}{p(y)} \end{equation*}\]
prior distribution = \(p(\theta)\), and the sampling distribution = \(p(y|\theta)\) also called the likelihood of the data.
Frequentist: - “What is the likelihood of observing these data, given the parameter(s) of the model?”
Bayesian inference will help you to get a distribution for your parameters. This distribution is what we call the posterior distribution.
See the linear regression model:
Statistics in general is about probability distributions, however in Bayesian probability this is more explicit because each parameter will have a distribution.
This means, Bayesian assumes that there is a distribution for each parameter similar to the frequentist approach, but also Bayesian is able to generate a distribution for the parameters in your model. This is not feasible from the frequentist point of view.
This example is simple but in reality the estimation of Bayesian probability can be more complicated.
That’s why we need numerical algorithms.
Markov chain Monte Carlo (MCMC) is a group of algorithms
Given that we need to sample random values from the posterior distribution we need algorithms to make sure we don’t sample the same values too many times. We need to keep as many different values as possible.
There are different methods to estimate models in Bayesian inferences.
The most used method is Markov Chain Monte Carlo (MCMC) simulation.
In MCMC there are several samplers, the most used currently are:
In the examples provided we are going to estimate models using the No U-turn sampler (NUTS) which is the default in Stan
You may read more about samplers in Richard McElreath’s blog.
I’ll introduce an example using the package brms
in R
Family: gaussian
Links: mu = identity; sigma = identity
Formula: depreZ ~ worryZ + rumZ
Data: rumData (Number of observations: 186)
Draws: 3 chains, each with iter = 5000; warmup = 2000; thin = 1;
total post-warmup draws = 9000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.01 0.06 -0.12 0.11 1.00 8661 6344
worryZ 0.20 0.07 0.06 0.33 1.00 8010 7292
rumZ 0.48 0.07 0.35 0.62 1.00 8076 6674
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.82 0.04 0.74 0.91 1.00 8387 6664
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can add a extremely dogmatic prior. We could believe that the distribution of our beta (\(\beta\)) parameters is \(\beta \sim N(0, 0.0001)\). Which means we believe the prior looks like this:
table Family: gaussian
Links: mu = identity; sigma = identity
Formula: depreZ ~ worryZ + rumZ
Data: rumData (Number of observations: 186)
Draws: 3 chains, each with iter = 5000; warmup = 2000; thin = 1;
total post-warmup draws = 9000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.01 0.07 -0.16 0.14 1.00 5062 4505
worryZ 0.00 0.00 -0.00 0.00 1.00 11332 6676
rumZ 0.00 0.00 -0.00 0.00 1.00 10949 6630
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.00 0.05 0.90 1.11 1.00 4903 4801
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
SEM has many parameters compare to a regression model. You will see a distribution for each parameter, and depending on the complexity of the model it might take several minutes or hours to run. Also, if the model is not identified , you will see that the chains for some parameters will not converge.
However, BSEM is very useful when Maximum Likelihood has convergence problems and your model is identified. Complicated models will likely converge in SEM models.
Models with small sample sizes also will converge if the model is properly identified.
Also, it is easier in BSEM to regularize parameters, or try different approaches to test invariance such as approximate invariance.