# Bayesian structural equation modeling II: Hierarchical latent variables

Tags: bayesian | python

Previously, I presented code to perform Bayesian confirmatory factor analysis (CFA; Jöreskog, 1969) using PyMC3 (Salvatier et al., 2016). In the preamble, I mentioned that CFA is a special case of structural equation modeling (SEM; Bollen, 1989). In practice, we use the term CFA when the models try to explain the covariance between a collection of observed variables (items) using a smaller number of latent variables (factors). The term SEM is used when the models contain a more elaborate structure of latent variables.

A classic example of SEM is when a CFA model of cognitive data is modified so that it contains an additional latent variable that influences the other factors but not the items directly. The additional latent variable is interpreted as general cognitive ability and denoted by $g$ (Spearman, 1904). Such a model may be called hierarchical because it contains two levels of latent variables. Here, I implement a hierarchical model here using the Holzinger and Swineford (1939) data set.

Relationships between variables in the hierarchical model.

## Building the model

This particular model requires three inputs. The first is a matrix of data denoted by $\boldsymbol{Y}$ with shape $n \times p$, where $n$ is the number of cases and $p$ is the number of items, just as in the previous CFA.

The second is a latent variable/factor design matrix denoted by $\boldsymbol{M}$ with shape $p \times m$. This is similar but not identical to $\boldsymbol{M}$ from the previous CFA. Specifically, it contains an extra column for the higher latent variable $g$. Since this latent variable isn’t directly connected to any items, its column contains only 0s.

The third, entirely new input is a matrix of paths between latent variables denoted by $\boldsymbol{B}$ with shape $m \times m$. If the value of the cell in row $i$ and column $j$ is 1, the $j$th latent variable influences the $i$th latent variable. This matrix can be asymmetric and must contain zeros down the diagonal.

As in the previous CFA model, we assume the data are multivariate normal,

where $\boldsymbol{\mu}$ is a $p$-length vector of means and $\boldsymbol{\Sigma}$ is an $n \times p$ covariance matrix. For former is given by

where $\boldsymbol{\nu}$ is $p$-length vector of item intercepts, $\boldsymbol{\Lambda}$ is a $p \times m$ matrix of factor loadings, and $\boldsymbol{\alpha}$ is a $m$-length vector of factor intercepts.

The covariance matrix of our new model is more complex than the one from the previous CFA model:

where $\boldsymbol{\Psi}$ is an $m \times m$ factor covariance matrix, $\boldsymbol{I}$ is an identity matrix of the same shape, $\boldsymbol{\Gamma}$ is a matrix of path weights, and $\boldsymbol{\Theta}$ is a $p \times p$ item covariance matrix.

Under the new model, $\boldsymbol{\Psi}$ is different than it was previously. Notice that SEM is flexible enough to estimate both covariances between all factors/latent variables via $\boldsymbol{\Psi}$ and paths between the same factors/latent variables via $\boldsymbol{\Gamma}$. We are interested in explaining the covariance between factors in terms of the new latent variable, so it doesn’t make sense to estimate any factor covariances. Moreover, the variances of all factors/latent variables should be 1. (Recall that we need to fix something in the model.) So, for the present purposes, $\boldsymbol{\Psi}$ will be an identity matrix.

As under the previous CFA model, our factor design matrix $\boldsymbol{M}$ dictates which items load on which factors, and the matrix $\boldsymbol{\Lambda}$ contains the actual loadings. Previously we used the trick described by Muthén and Asparouhov (2012) to estimate both loadings and minor (or cross-) loadings via

where $\boldsymbol{\Phi}$ is a matrix of unscaled factor loadings, $\beta$ is the cross-loading standard deviation, and $\circ$ denotes the Hadamard product. We assigned univariate standard normal priors to all elements in $\boldsymbol{\Phi}$ and a beta prior to $\beta$. Under this model, this trick would allow the additional latent variable to influence the items directly—we don’t want this to happen. We can prevent all cross-loadings without changing the model drastically by fixing $\beta=0$.

The matrix of path weights $\boldsymbol{\Gamma}$ is the same as the input matrix $\boldsymbol{B}$ except values of 1 are replaced with values from new random vector, $\boldsymbol{b}$. This vector will have a standard normal prior distribution. I’m not sure how to represent this operation using traditional notation. It should be clear that the length of $\boldsymbol{b}$ is the elementwise sum of $\boldsymbol{B}$.

Now all that’s left to do is to place prior distributions on $\boldsymbol{\nu}$, $\boldsymbol{\alpha}$, and $\boldsymbol{\Theta}$. I’ll use the same priors for $\boldsymbol{\nu}$ and $\boldsymbol{\alpha}$, but simplify the model a bit by making $\boldsymbol{\Theta}$ a diagonal matrix.

## Code and results

Below is code to implement the model.

"""Example of Bayesian confirmatory factor analysis in PyMC3.

"""
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt
import matplotlib.pyplot as plt
from os.path import exists

from matplotlib import rcParams
from pymc3.math import matrix_dot, matrix_inverse
from tabulate import tabulate

def bsem(
items,
factors,
paths,
beta=0,
nu_sd=2.5,
alpha_sd=2.5,
d_beta=2.5,
corr_items=False,
corr_factors=False,
g_eta=100,
l_eta=1,
beta_beta=1,
):
r"""Constructs Bayesian SEM.

Args:
items (np.array): Array of item data.
factors (np.array): Factor design.
paths (np.array): Array of directed factor paths.
beta (:obj:float or 'estimate', optional): Standard deviation of normal
prior on cross loadings. If 'estimate',  beta is estimated from the data.
nu_sd (:obj:float, optional): Standard deviation of normal prior on item
intercepts.
alpha_sd (:obj:float, optional): Standard deviation of normal prior on factor
intercepts.
d_beta (:obj:float, optional): Scale parameter of half-Cauchy prior on factor
standard deviation.
corr_factors (:obj:bool, optional): Allow correlated factors.
corr_items (:obj:bool, optional): Allow correlated items.
g_eta (:obj:float, optional): Shape parameter of LKJ prior on residual item
correlation matrix.
l_eta (:obj:float, optional): Shape parameter of LKJ prior on factor
correlation matrix.
beta_beta (:obj:float, optional): Beta parameter of beta prior on beta.

Returns:

None: Places model in context.

"""
# get numbers of cases, items, and factors
n, p = items.shape
p_, m = factors.shape
assert paths.shape == (m, m), "Paths matrix has wrong shape"
I = tt.eye(m, m)

# place priors on item and factor intercepts
nu = pm.Normal(name=r"$\nu$", mu=0, sd=nu_sd, shape=p, testval=items.mean(axis=0))
alpha = pm.Normal(name=r"$\alpha$", mu=0, sd=alpha_sd, shape=m, testval=np.zeros(m))

Phi = pm.Normal(name=r"$\Phi$", mu=0, sd=1, shape=factors.shape, testval=factors)

# place priors on paths
B = tt.zeros(paths.shape)
npths = np.sum(paths, axis=None)
print(npths)
if npths > 0:
b = pm.Normal(name=r"$b$", mu=0, sd=1, shape=npths, testval=np.ones(npths))
# create the paths matrix
k = 0
for i in range(m):
for j in range(m):
if paths[i, j] == 1:
B = tt.set_subtensor(B[i, j], b[k])
k += 1
Gamma = pm.Deterministic("$\Gamma$", B)

if isinstance(beta, str):
assert beta == "estimate", f"Don't know what to do with '{beta}'"
beta = pm.Beta(name=r"$\beta$", alpha=1, beta=beta_beta, testval=0.1)
M = (1 - np.asarray(factors)) * beta + np.asarray(factors)

Lambda = pm.Deterministic(r"$\Lambda$", Phi * M)

# determine item means
mu = nu + matrix_dot(Lambda, alpha)

# place priors on item standard deviations
D = pm.HalfCauchy(name=r"$D$", beta=d_beta, shape=p, testval=items.std(axis=0))

# place priors on item correlations
f = pm.Lognormal.dist(sd=0.25)
if not corr_items:
Omega = np.eye(p)
else:
G = pm.LKJCholeskyCov(name=r"$G$", eta=g_eta, n=p, sd_dist=f)
ch1 = pm.expand_packed_triangular(p, G, lower=True)
K = tt.dot(ch1, ch1.T)
sd1 = tt.sqrt(tt.diag(K))
Omega = pm.Deterministic(r"$\Omega$", K / sd1[:, None] / sd1[None, :])

# determine residual item variances and covariances
Theta = pm.Deterministic(r"$\Theta$", D[None, :] * Omega * D[:, None])

# place priors on factor correlations
if not corr_factors:
Psi = np.eye(m)
else:
L = pm.LKJCholeskyCov(name=r"$L$", eta=l_eta, n=m, sd_dist=f)
ch = pm.expand_packed_triangular(m, L, lower=True)
Gamma = tt.dot(ch, ch.T)
sd = tt.sqrt(tt.diag(Gamma))
Psi = pm.Deterministic(r"$\Psi$", Gamma / sd[:, None] / sd[None, :])

# determine variances and covariances of items
A = matrix_inverse(I - Gamma)
C = matrix_inverse(I - Gamma.T)
Sigma = matrix_dot(Lambda, A, Psi, C, Lambda.T) + Theta

# place priors on observations
pm.MvNormal(name="$Y$", mu=mu, cov=Sigma, observed=items, shape=items.shape)

def main():

# define items to keep
item_names = [
"visual",
"cubes",
"paper",
"flags",
"general",
"paragrap",
"sentence",
"wordc",
"wordm",
"code",
"counting",
"straight",
"wordr",
"numberr",
"figurer",
"object",
"numberf",
"figurew",
]

# define the factor structure
factors = np.array(
[
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
]
)

paths = np.array(
[
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
]
)

# iterate over the two schools
for school, sdf in df.groupby("school"):

# define the path to save results
f = f"../data/BSEM examples/{school}"

# select the 19 commonly used variables
items = sdf[item_names]

# for numerical convenience, standardize the data
items = (items - items.mean()) / items.std()

with pm.Model():

# construct the model
bsem(items, factors, paths)

if not exists(f):

# sample and save
trace = pm.sample(chains=2)  # 19000, tune=1000,
pm.save_trace(trace, f)

else:

pm.traceplot(trace, compact=True)
rcParams["font.size"] = 14
plt.savefig(f"{f}/traceplot.png")

# create a nice summary table
trace[r"$\Lambda$"].mean(axis=0).round(3),
index=[v.title() for v in item_names],
columns=["Spatial", "Verbal", "Speed", "Memory", "g"],
)
#
# # correlations = pd.DataFrame(
# #     trace[r"$\Psi$"].mean(axis=0).round(3),
# #     index=["Spatial", "Verbal", "Speed", "Memory", "g"],
# #     columns=["Spatial", "Verbal", "Speed", "Memory", "g"],
# # )
# # correlations.to_csv(f"{f}/factor_correlations.csv")
#
_paths = pd.DataFrame(
trace[r"$\Gamma$"].mean(axis=0).round(3),
index=["Spatial", "Verbal", "Speed", "Memory", "g"],
columns=["Spatial", "Verbal", "Speed", "Memory", "g"],
)
_paths.to_csv(f"{f}/factor_paths.csv")
#
# correlations = pd.DataFrame(
#     trace[r"$\Omega$"].mean(axis=0).round(3),
#     index=item_names,
#     columns=item_names,
# )
# correlations.to_csv(f"{f}/item_correlations.csv")

if __name__ == "__main__":
main()



Spatial Verbal Speed Memory g
Visual 0.43 0 0 0 0
Cubes 0.301 0 0 0 0
Paper 0.331 0 0 0 0
Flags 0.42 0 0 0 0
General 0 0.606 0 0 0
Paragrap 0 0.617 0 0 0
Sentence 0 0.632 0 0 0
Wordc 0 0.522 0 0 0
Wordm 0 0.636 0 0 0
Addition 0 0 0.455 0 0
Code 0 0 0.482 0 0
Counting 0 0 0.49 0 0
Straight 0 0 0.526 0 0
Wordr 0 0 0 0.354 0
Numberr 0 0 0 0.356 0
Figurer 0 0 0 0.403 0
Object 0 0 0 0.431 0
Numberf 0 0 0 0.431 0
Figurew 0 0 0 0.33 0

And here are the paths:

Spatial Verbal Speed Memory g
Spatial 0 0 0 0 1.293
Verbal 0 0 0 0 0.888
Speed 0 0 0 0 1.032
Memory 0 0 0 0 1.1
g 0 0 0 0 0

## References

Bollen, K. A. (1989). Structural equations with latent variables. Wiley.

Holzinger, K. J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, 48, 91.

Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183–202. 10.1007/BF02289343

Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3), 313–335. 10.1037/a0026802

Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. (2016). Probabilistic programming in Python using PyMC3. PeerJ Comput Science, 2, e55. 10.7717/peerj-cs.55

Spearman, C. (1904). “General intelligence,” objectively determined and measured. The American Journal of Psychology, 15(2), 201. 10.2307/1412107

## Version history

• Originally posted May 15, 2020.