# Bayesian structural equation modeling II: Hierarchical latent variables

**Tags**: bayesian | python

Previously, I presented code to perform Bayesian confirmatory factor analysis (CFA; Jöreskog, 1969)
using PyMC3 (Salvatier *et al.*, 2016). In the preamble, I mentioned that CFA is
a special case of structural equation modeling (SEM; Bollen, 1989). In practice, we use the term
CFA when the models try to explain the covariance between a collection of observed variables (items) using a smaller number
of latent variables (factors). The term SEM is used when the models contain a more elaborate structure of latent variables.

A classic example of SEM is when a CFA model of cognitive data is modified so that it contains an additional latent variable
that influences the other factors but not the items directly. The additional latent variable is interpreted as general
cognitive ability and denoted by (Spearman, 1904). Such a model may be called *hierarchical* because
it contains two levels of latent variables. Here, I implement a hierarchical model here using
the Holzinger and Swineford (1939) data set.

*Relationships between variables in the hierarchical model.*

## Building the model

This particular model requires three inputs. The first is a matrix of data denoted by with shape , where is the number of cases and is the number of items, just as in the previous CFA.

The second is a latent variable/factor design matrix denoted by with shape . This is similar but not identical to from the previous CFA. Specifically, it contains an extra column for the higher latent variable . Since this latent variable isn’t directly connected to any items, its column contains only 0s.

The third, entirely new input is a matrix of paths between latent variables denoted by with shape . If the value of the cell in row and column is 1, the th latent variable influences the th latent variable. This matrix can be asymmetric and must contain zeros down the diagonal.

As in the previous CFA model, we assume the data are multivariate normal,

where is a -length vector of means and is an covariance matrix. For former is given by

where is -length vector of item intercepts, is a matrix of factor loadings, and is a -length vector of factor intercepts.

The covariance matrix of our new model is more complex than the one from the previous CFA model:

where is an factor covariance matrix, is an identity matrix of the same shape, is a matrix of path weights, and is a item covariance matrix.

Under the new model, is different than it was previously. Notice that SEM is flexible
enough to estimate both covariances between all factors/latent variables via *and* paths between
the same factors/latent variables via . We are interested in explaining the covariance between
factors in terms of the new latent variable, so it doesn’t make sense to estimate any factor covariances. Moreover, the
variances of all factors/latent variables should be 1. (Recall that we need to fix something in the model.) So, for the
present purposes, will be an identity matrix.

As under the previous CFA model, our factor design matrix dictates which items load on which factors, and the matrix contains the actual loadings. Previously we used the trick described by Muthén and Asparouhov (2012) to estimate both loadings and minor (or cross-) loadings via

where is a matrix of unscaled factor loadings, is the cross-loading standard deviation, and denotes the Hadamard product. We assigned univariate standard normal priors to all elements in and a beta prior to . Under this model, this trick would allow the additional latent variable to influence the items directly—we don’t want this to happen. We can prevent all cross-loadings without changing the model drastically by fixing .

The matrix of path weights is the same as the input matrix except values of 1 are replaced with values from new random vector, . This vector will have a standard normal prior distribution. I’m not sure how to represent this operation using traditional notation. It should be clear that the length of is the elementwise sum of .

Now all that’s left to do is to place prior distributions on , , and . I’ll use the same priors for and , but simplify the model a bit by making a diagonal matrix.

## Code and results

Below is code to implement the model.

```
"""Example of Bayesian confirmatory factor analysis in PyMC3.
"""
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt
import matplotlib.pyplot as plt
from os.path import exists
from matplotlib import rcParams
from pymc3.math import matrix_dot, matrix_inverse
from tabulate import tabulate
def bsem(
items,
factors,
paths,
beta=0,
nu_sd=2.5,
alpha_sd=2.5,
d_beta=2.5,
corr_items=False,
corr_factors=False,
g_eta=100,
l_eta=1,
beta_beta=1,
):
r"""Constructs Bayesian SEM.
Args:
items (np.array): Array of item data.
factors (np.array): Factor design.
paths (np.array): Array of directed factor paths.
beta (:obj:`float` or `'estimate'`, optional): Standard deviation of normal
prior on cross loadings. If `'estimate'`, beta is estimated from the data.
nu_sd (:obj:`float`, optional): Standard deviation of normal prior on item
intercepts.
alpha_sd (:obj:`float`, optional): Standard deviation of normal prior on factor
intercepts.
d_beta (:obj:`float`, optional): Scale parameter of half-Cauchy prior on factor
standard deviation.
corr_factors (:obj:`bool`, optional): Allow correlated factors.
corr_items (:obj:`bool`, optional): Allow correlated items.
g_eta (:obj:`float`, optional): Shape parameter of LKJ prior on residual item
correlation matrix.
l_eta (:obj:`float`, optional): Shape parameter of LKJ prior on factor
correlation matrix.
beta_beta (:obj:`float`, optional): Beta parameter of beta prior on beta.
Returns:
None: Places model in context.
"""
# get numbers of cases, items, and factors
n, p = items.shape
p_, m = factors.shape
assert p == p_, "Mismatch between data and factor-loading matrices"
assert paths.shape == (m, m), "Paths matrix has wrong shape"
I = tt.eye(m, m)
# place priors on item and factor intercepts
nu = pm.Normal(name=r"$\nu$", mu=0, sd=nu_sd, shape=p, testval=items.mean(axis=0))
alpha = pm.Normal(name=r"$\alpha$", mu=0, sd=alpha_sd, shape=m, testval=np.zeros(m))
# place priors on unscaled factor loadings
Phi = pm.Normal(name=r"$\Phi$", mu=0, sd=1, shape=factors.shape, testval=factors)
# place priors on paths
B = tt.zeros(paths.shape)
npths = np.sum(paths, axis=None)
print(npths)
if npths > 0:
b = pm.Normal(name=r"$b$", mu=0, sd=1, shape=npths, testval=np.ones(npths))
# create the paths matrix
k = 0
for i in range(m):
for j in range(m):
if paths[i, j] == 1:
B = tt.set_subtensor(B[i, j], b[k])
k += 1
Gamma = pm.Deterministic("$\Gamma$", B)
# create masking matrix for factor loadings
if isinstance(beta, str):
assert beta == "estimate", f"Don't know what to do with '{beta}'"
beta = pm.Beta(name=r"$\beta$", alpha=1, beta=beta_beta, testval=0.1)
M = (1 - np.asarray(factors)) * beta + np.asarray(factors)
# create scaled factor loadings
Lambda = pm.Deterministic(r"$\Lambda$", Phi * M)
# determine item means
mu = nu + matrix_dot(Lambda, alpha)
# place priors on item standard deviations
D = pm.HalfCauchy(name=r"$D$", beta=d_beta, shape=p, testval=items.std(axis=0))
# place priors on item correlations
f = pm.Lognormal.dist(sd=0.25)
if not corr_items:
Omega = np.eye(p)
else:
G = pm.LKJCholeskyCov(name=r"$G$", eta=g_eta, n=p, sd_dist=f)
ch1 = pm.expand_packed_triangular(p, G, lower=True)
K = tt.dot(ch1, ch1.T)
sd1 = tt.sqrt(tt.diag(K))
Omega = pm.Deterministic(r"$\Omega$", K / sd1[:, None] / sd1[None, :])
# determine residual item variances and covariances
Theta = pm.Deterministic(r"$\Theta$", D[None, :] * Omega * D[:, None])
# place priors on factor correlations
if not corr_factors:
Psi = np.eye(m)
else:
L = pm.LKJCholeskyCov(name=r"$L$", eta=l_eta, n=m, sd_dist=f)
ch = pm.expand_packed_triangular(m, L, lower=True)
Gamma = tt.dot(ch, ch.T)
sd = tt.sqrt(tt.diag(Gamma))
Psi = pm.Deterministic(r"$\Psi$", Gamma / sd[:, None] / sd[None, :])
# determine variances and covariances of items
A = matrix_inverse(I - Gamma)
C = matrix_inverse(I - Gamma.T)
Sigma = matrix_dot(Lambda, A, Psi, C, Lambda.T) + Theta
# place priors on observations
pm.MvNormal(name="$Y$", mu=mu, cov=Sigma, observed=items, shape=items.shape)
def main():
# load the data
df = pd.read_csv("../../assets/data/HS.csv", index_col=0)
# define items to keep
item_names = [
"visual",
"cubes",
"paper",
"flags",
"general",
"paragrap",
"sentence",
"wordc",
"wordm",
"addition",
"code",
"counting",
"straight",
"wordr",
"numberr",
"figurer",
"object",
"numberf",
"figurew",
]
# define the factor structure
factors = np.array(
[
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
]
)
paths = np.array(
[
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
]
)
# iterate over the two schools
for school, sdf in df.groupby("school"):
# define the path to save results
f = f"../data/BSEM examples/{school}"
# select the 19 commonly used variables
items = sdf[item_names]
# for numerical convenience, standardize the data
items = (items - items.mean()) / items.std()
with pm.Model():
# construct the model
bsem(items, factors, paths)
if not exists(f):
# sample and save
trace = pm.sample(chains=2) # 19000, tune=1000,
pm.save_trace(trace, f)
else:
trace = pm.load_trace(f)
pm.traceplot(trace, compact=True)
rcParams["font.size"] = 14
plt.savefig(f"{f}/traceplot.png")
# create a nice summary table
loadings = pd.DataFrame(
trace[r"$\Lambda$"].mean(axis=0).round(3),
index=[v.title() for v in item_names],
columns=["Spatial", "Verbal", "Speed", "Memory", "g"],
)
loadings.to_csv(f"{f}/loadings.csv")
print(tabulate(loadings, tablefmt="pipe", headers="keys"))
#
# # correlations = pd.DataFrame(
# # trace[r"$\Psi$"].mean(axis=0).round(3),
# # index=["Spatial", "Verbal", "Speed", "Memory", "g"],
# # columns=["Spatial", "Verbal", "Speed", "Memory", "g"],
# # )
# # correlations.to_csv(f"{f}/factor_correlations.csv")
#
_paths = pd.DataFrame(
trace[r"$\Gamma$"].mean(axis=0).round(3),
index=["Spatial", "Verbal", "Speed", "Memory", "g"],
columns=["Spatial", "Verbal", "Speed", "Memory", "g"],
)
_paths.to_csv(f"{f}/factor_paths.csv")
print(tabulate(_paths, tablefmt="pipe", headers="keys"))
#
# correlations = pd.DataFrame(
# trace[r"$\Omega$"].mean(axis=0).round(3),
# index=item_names,
# columns=item_names,
# )
# correlations.to_csv(f"{f}/item_correlations.csv")
if __name__ == "__main__":
main()
```

Here are factor loadings for the Grant-White school:

Spatial | Verbal | Speed | Memory | g | |
---|---|---|---|---|---|

Visual | 0.43 | 0 | 0 | 0 | 0 |

Cubes | 0.301 | 0 | 0 | 0 | 0 |

Paper | 0.331 | 0 | 0 | 0 | 0 |

Flags | 0.42 | 0 | 0 | 0 | 0 |

General | 0 | 0.606 | 0 | 0 | 0 |

Paragrap | 0 | 0.617 | 0 | 0 | 0 |

Sentence | 0 | 0.632 | 0 | 0 | 0 |

Wordc | 0 | 0.522 | 0 | 0 | 0 |

Wordm | 0 | 0.636 | 0 | 0 | 0 |

Addition | 0 | 0 | 0.455 | 0 | 0 |

Code | 0 | 0 | 0.482 | 0 | 0 |

Counting | 0 | 0 | 0.49 | 0 | 0 |

Straight | 0 | 0 | 0.526 | 0 | 0 |

Wordr | 0 | 0 | 0 | 0.354 | 0 |

Numberr | 0 | 0 | 0 | 0.356 | 0 |

Figurer | 0 | 0 | 0 | 0.403 | 0 |

Object | 0 | 0 | 0 | 0.431 | 0 |

Numberf | 0 | 0 | 0 | 0.431 | 0 |

Figurew | 0 | 0 | 0 | 0.33 | 0 |

And here are the paths:

Spatial | Verbal | Speed | Memory | g | |
---|---|---|---|---|---|

Spatial | 0 | 0 | 0 | 0 | 1.293 |

Verbal | 0 | 0 | 0 | 0 | 0.888 |

Speed | 0 | 0 | 0 | 0 | 1.032 |

Memory | 0 | 0 | 0 | 0 | 1.1 |

g | 0 | 0 | 0 | 0 | 0 |

## References

Bollen, K. A. (1989).
*Structural equations with latent variables*.
Wiley.

Holzinger, K. J., & Swineford, F. (1939).
A study in factor analysis: The stability of a bi-factor solution.
*Supplementary Educational Monographs*,
*48*,
91.

Jöreskog, K. G. (1969).
A general approach to confirmatory maximum likelihood factor analysis.
*Psychometrika*,
*34*(2),
183–202.
10.1007/BF02289343

Muthén, B., & Asparouhov, T. (2012).
Bayesian structural equation modeling: A more flexible representation of substantive theory.
*Psychological Methods*,
*17*(3),
313–335.
10.1037/a0026802

Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. (2016).
Probabilistic programming in Python using PyMC3.
*PeerJ Comput Science*,
*2*,
e55.
10.7717/peerj-cs.55

Spearman, C. (1904).
“General intelligence,” objectively determined and measured.
*The American Journal of Psychology*,
*15*(2),
201.
10.2307/1412107

## Version history

- Originally posted May 15, 2020.