Introduction to path analysis with manifest variables using AMOS
Mike Crowson, Ph.D.
                                      The University of Oklahoma
                                            October 2020
                                       (last updated Dec 2020)
Video: https://youtu.be/9tm4YqTSM6M
This model was tested based on the following article (and data provided in supportive information) shared assuming Creative
Commons Attribution:
Zhou, J., Yang, Y., Qiu, X., Yang, X., Pan, H., Ban, B., et al. (2016). Relationship between anxiety and burnout among Chinese
physicians: A moderated mediation model. PLoS ONE 11(8): e0157013. doi:10.1371/journal.pone.0157013.
Supplementary data can be obtained at: https://doi.org/10.1371/journal.pone.0157013.s001
For this demonstration, the model being tested involves data and variables used in this article. However, the model was not
tested by the authors. Additionally, several scale variables were computed somewhat differently from the manner the authors
computed them due to difficulties in reconstructing the coding/scoring for some of the variables.
‘extrav’ = extraversion (computed from 11 items; α=.793).
‘neurot’ = neuroticism (computed from 12 items; α=.844).
‘poscope’ = positive coping (computed from 10 items; α=.815).
‘negcope’ = negative coping computed from 10 items; α=.803).
‘burnpos’ = burnout (computed from 10 positively worded items from
scale; α=.810)
‘anxpos’ = anxiety (computed from 15 positively worded items from
scale; α=.932)
[Note: Positively worded items were used from the anxiety and
burnout scales given uncertainty about whether the negatively-
worded items had been recoded in the dataset provided. Nevertheless,
most of these items exhibited very low item-to-total correlations with
the full scale, whether or not an effort was made to include the items
with or without recoding. Additionally, the extraversion and
neuroticism scales were computed based on consideration of thematic
content with the theoretical constructs and examination of results
from preliminary EFAs.]
Download SPSS data with the new scored variables here:
https://drive.google.com/file/d/1RLAoOJjrF4jHFXHZTQ_3u5lWaYovFX5
L/view?usp=sharing
Link to final model I put together in AMOS:
https://drive.google.com/file/d/1Tqqt2kaU0CW0Cz-6e5IDHC1xhd_GLR
q1/view?usp=sharing
Before we get started, a bit of terminology. In SEM, we have two general types of variables represented: Endogenous and
exogenous.
Exogenous variables are those that do not have arrows pointing to them from other variables. Although Extraversion and
neuroticism are correlated variables (see double-headed arrow), they are exogenous in this model (and function most like what
we think of as independent variables). Endogenous variables are those that have arrows pointing at them from other variables.
The ‘poscope’, ‘negcope’, ‘burnpos’, and ‘anxpos’ variables are all endogenous in the model. Although ‘poscope’, ‘negcope’, and
‘burnpos’ are predictors of ‘anxpos’, the are also specified as outcomes of extraversion or neuroticism. Moreover, ‘burnpos’ is
predicted by neuroticism and the coping variables. You can easily identify the endogenous variables in this model since they
have error terms (i.e., disturbance terms) toward them (from the circles/ovals). So, as you can see proposed mediators in your
model still fall under the designation of being an endogenous variable, along with the variables downstream they are
predicting. [By the way, the double-headed arrow between the disturbances for the coping variable reflects the likely remaining
association between positive and negative coping after accounting for their predictors.]
Step 1: Import data into AMOS. Click on icon go ‘Select data file(s)’. When the box on the right opens up, then
click ‘File name’ and then select the data file from a drive on your computer.
The icons shown in the left panel can be thought of as ‘buttons’
to activate various drawing options. For instance, the rectangle
button can be used to draw out observed (aka, manifest)
variables. The oval button can be used to draw latent variables.
This icon is handy for drawing in measurement error terms and
disturbance terms.
The ‘Analysis Properties’ menu allows you to specify the
estimation method, as well as a variety of other options
related to your analysis. The default estimation approach
is Maximum likelihood.
The ‘Estimate means and intercepts’ box should be clicked when you missing data in your dataset on your variables.
Our current dataset does not have missing data, so we will leave this unclicked. However, if there were missing data on
our variables, then we would need to click this button or else we will get an error message saying the box needs to be
checked.
In cases where the above button is clicked, the program will treat missing data using Full Information Maximum
Likelihood (FIML) estimation.
Clicking on these will result in you getting
standardized path coefficients and squared multiple
correlations in the output.
Click here to run the analysis
At the top of the screen is a toggle. When you click the right toggle, then…
…you get parameter estimates. Right now, the unstandardized estimates are shown. But we can click the ‘standardized
estimates’ (see option on the left) to get standardized estimates (see next screen).
Click on this icon to access the output files
This tab takes you to the Model fit indices.
                                                                                   The CMIN that appears in the output is the chi-
                                                                                   square values that is the traditional approach to
                                                                                   testing a model for goodness of fit. [The chi-square
                                                                                   goodness of fit test is used to evaluate whether a
                                                                                   model departs significantly from one that fits
                                                                                   exactly to the data (Kline, 2016).] The DF is
                                                                                   degrees of freedom, and the p-value is the
                                                                                   significance level. Traditionally, if p≤.05, then we
                                                                                   reject the null of an exact-fitting model.
The Normed fit index (NFI), Relative fit index (RFI), Incremental fit index (IFI), Comparative fit index (CFI), and Tucker-Lewis Index
(TLI; also referred to as Non-normed fit index, or NNFI) are all incremental or comparative fit indices (i.e., whereby they compare
the fit of a model against that of a null or independence model; Byrne, 2010; Schumacker & Lomax, 2016). The RFI, IFI, NNFI, and
CFI all account for model complexity/parsimony in their computations (to a greater or lesser degree). These indices generally
range between 0 and 1 (although it is possible to have values slightly exceed 1 on some). Values ≥ .90 for these indices are treated
as indicative of an acceptable fitting model (see Whittaker, 2016), although values ≥ .95 may be considered as evidence of more
‘superior fit’ (Byrne, 2010, p. 79). Two of the more commonly reported comparative fit indices are the TLI and CFI.
[Information in this slide and the next is heavily abstracted from my Powerpoint on CFA in AMOS:
https://drive.google.com/file/d/1JMpJh7iy1WqkDUy0Ik92rv32PwJOcjZA/view?usp=sharing]
The goodness of fit index (GFI) and adjusted goodness of fit index (AGFI) are absolute fit indices (Kline, 2016). According to Byrne
(2010), the GFI measures ‘the relative amount of variance and covariance’ in the sample covariance matrix that is ‘jointly
explained’ (p. 76) model-implied population covariance matrix. The AGFI represents an adjustment to the GFI for the number of
parameters estimated; therefore, this makes it a parsimony-adjusted index (Byrne, 2010). The GFI and AGFI range from 0 to 1,
where higher values indicate greater fit (Byrne, 2010). Values > .90 or .95 are typically regarded as indicating acceptable to good
model fit (see Schumacker & Lomax, 2016; Whittaker, 2016).
It is important to note that these indices will not appear in the AMOS output in cases where there is missing data, where you have
to click on ‘Estimate means and intercepts’ to get the model to run. The reason they appear here is because we are working with a
complete dataset.
The chi-square test here indicates the model
departs significantly from exact fit, χ²(4) = 29.931,
p<.001.
The GFI and AGFI both indicate a well-fitting
model (as both are > .90).
The TLI and CFI values are both > .90, where both
are indicating an acceptable to good fitting model.
The Root mean-square error of approximation (RMSEA) also can be considered an ‘absolute fit index’, with 0 indicating the ‘best
fit’ and values > 0 suggest worse fit (Kline, 2016). Values of .05 or below on the RMSEA are generally considered indicative of a
close-fitting model. Values between up to .08 (see Brown & Cudeck, 1993; as cited by Whittaker, 2016) or .10 (Hu & Bentler 1995;
as cited by Whittaker, 2016) are considered acceptable. According to Kline (2016), Brown and Cudeck suggested an RMSEA ≥ .10 as
a model that may have more serious problems in its specification.
In our output, the RMSEA = .076, which falls between .05 (close fit) and .10 (poor fit). So the RMSEA based on our model suggests
the model does not represent a close fit to the data, but nevertheless indicates acceptable fit.
The PCLOSE test provides another way of assessing fit of a model based on the RMSEA. If we assume that an RMSEA value ≤ .05
represents a close-fitting model (see above), then a p-close test result where p>.05 can be viewed as supporting the null
hypothesis of close model fit (Kline, 2016).
In our current analysis, PCLOSE is .039, which suggests rejection of the null hypothesis of close fit (i.e., it does not support our
model).
[Information in this slide and the next is heavily abstracted from my Powerpoint on CFA in AMOS:
https://drive.google.com/file/d/1JMpJh7iy1WqkDUy0Ik92rv32PwJOcjZA/view?usp=sharing]
The 90% CI interval provides one other way of assessing fit based on the RMSEA. Specifically, it is an interval estimate for the
RMSEA. If the upper bound (HI 90 in the table) falls below .05, then we may interpret this as supporting a close-fitting model. If
the upper bound is > .10 (threshold for ‘poor fit’), then we have weaker evidence in support of a well-fitting model (Kline,
2016).
If the lower bound of the interval is > .05 (close fit) and the upper bound is < .10 (poor fit), then although our model does not
pass the test of close fit, it nonetheless may represent an acceptable fit to the data (see Kline, 2016).
Additionally, the wider the confidence interval, the less confidence one should have in the point estimate for the RMSEA. Kline
(2016) provides an example where an RMSEA for a model is .057 and the 90% CI ranges from .003 to .103. The lower bound
suggests a close-fitting model, whereas the upper bound suggests a poor-fitting model. Kline (2016) resolved the apparent
contradiction by stating that the model is ‘just as consistent with the close-fit hypothesis as it is with the poor-fit hypothesis’
(p. 275).
In our output, the lower bound is .052 and the upper bound is .102. Since the lower bound falls in the ‘acceptable range’ and
the upper bound falls at the level of ‘poor fit’, then we have somewhat mixed results pertaining to acceptability of our model
fit using the confidence interval approach.
Click under the Estimates tab to get the unstandardized path coefficients and significance tests, as well as the standardized
estimates, etc. (scroll down).
Extraversion was a positive and significant (b=.158, s.e.=.065, p=.015) predictor of positive coping, whereas neuroticism was
a positive and significant (b=.299, s.e.=.061, p<.001) predictor of negative coping. Positive coping was a negative and
significant (b=-.036, s.e.=.004, p<.001) predictor of burnout, while coping was a positive (b=.063, s.e.=.004, p<.001)
predictor of burnout. Positive coping was a negative and significant (b=-.008, s.e.=.002, p<.001) predictor of anxiety. Anxiety
(b=.019, s.e.=.004, p<.001), negative coping (b=.020, p<.001), and burnout (b=.192, s.e.=.015, p<.001) all were significant
positive predictors of anxiety.
These are standardized path coefficients which are interpreted the
same way as beta coefficients in the context of OLS regression.
                  The correlations table (below) include
                  correlations between our exogenous variables
                  (i.e., the variables with no variables pointed
                  toward them) and the disturbance terms (i.e.,
                  our prediction error). The table of covariances
                  (left) contain the estimated covariances and also
                  associated tests of the covariances.
Extraversion and neuroticism were correlated at r = -.367, whereas the correlation between the disturbance terms for the
endogenous positive and negative coping variables was r = -.315.
The test of the covariances between extraversion and neuroticism was significant (p<.001), as was the test of the
covariance between the disturbance terms for the positive and negative coping variables (p<.001).
The squared multiple correlations is equivalent to R-square. These are provided for all endogenous variables within the
model (where endogenous variables are specified as outcomes of other variables in the model).
Since only extraversion is included as a predictor of positive coping, we can say that it accounted for approximately .005 X
100% = .5% of the variance in positive coping. Since only neuroticism is specified as a predictor of negative coping, we can
say that it accounted for approximately .011 X 100% = 1.1% of the variance in negative coping. Since burnout was specified
as an outcome of positive coping, negative coping, and neuroticism, we can say that its predictors accounted for
approximately .279 X 100% = 27.9% of the variance in burnout. Finally, since positive coping, negative coping, neuroticism,
and burnout were all predictors of anxiety, then we can say that these variables jointly accounted for .337 X 100% = 33.7% of
the variation in anxiety.
                                      Additional Analysis Properties in Amos
Note 1: You can only request these and have output generated if you have a complete set of data. These cannot be used
in cases where you have missing data.
Note 2: Much of the discussion and citations in this section are abstracted from my Powerpoint presentation on CFA in
AMOS: https://drive.google.com/file/d/1JMpJh7iy1WqkDUy0Ik92rv32PwJOcjZA/view?usp=sharing . Nevertheless, the
examples still follow those from our current path analysis example.
In those cases where you have complete data (and where you
didn’t click on ‘Estimate means and intercepts’ under the
Estimation tab, you can obtain the following additional output:
‘Residual momements’ -> provides the matrix of unstandardized
and standardized residuals, which may be useful in diagnosing
potential sources of model misspecification.
‘Modification indices’ -> provides modification indices, which
again can be useful in diagnosis potential sources of model
misspecification.
‘Tests for normality and outliers’ -> useful for identifying potential
violations of univariate and multivariate normality, as well as
identification of potential multivariate outliers.
In those cases where you have missing data, you can still obtain
the ‘Indirect, direct, and total effects. When used in conjunction
with bootstrapping (see ‘Bootstrap’ tab), you can test indirect
and total effects in a model. But note, the bootstrap option is not
available if you have missing data!
Under the ‘bootstrap’ tab, you can request bootstrap results. This
is particularly useful for testing indirect effects within your
model.
These matrices reflect differences between the sample covariance matrix and the model-implied covariance matrix.
Standardized residuals can be handy for identifying potential areas of model misspecification. Byrne (2010) states that these
residuals can be considered ‘analogous to Z-scores’ (p. 86) and notes that larger standardized residuals can be considered an
indication of potential misspecification of the model between two variables. Whittaker (2016) suggests further investigation of
standardized residuals greater than 1.96 in absolute value, whereas Byrne (2010; citing Joreskog & Sorbom, 1993) described
standardized residuals greater than 2.58 in absolute value as ‘large’.
In the standardized residuals, we see the largest residuals are 2.738 and 4.577. Both residuals suggest model misspecifications
that may have produced observed discrepancies in the covariances involving extraversion and burnout and anxiety.
Unsurprisingly, based on our review of the standardized residuals, the modification indices suggest adding a path from extraversion
to burnout and from extraversion to anxiety.
*IMPORTANT: The use of the residual matrices and modification indices are empirically-based strategies diagnosing sources of
model misspecification, which may be used when deciding on if and how to re-specify your model. Be forewarned that relying
solely on empirical criteria to make decisions about model re-specification can lead you towards a model that fits better with your
data than your original model, yet the same model fit will not be achieved in other population samples. This can occur as the re-
specifications capitalize on chance characteristics of your data. For this reason, it is extremely important to be judicious in any
model re-specifications based on empirical criteria and to make sure that any changes you make are reasonable given theory.
A key assumption when performing SEM with ML estimation is that your variables (and, in particular, the endogenous
variables; see Kline, 2016) in your model are multivariately normally distributed. When the assumption of multivariate
normality is violated, this can lead to an inflated chi-square value leading it (and other tests derived from it) to suggest worse
fit to the data. On the other hand, it can also result in standard errors that are biased downward, potentially inflating Type 1
error risk when testing your model’s parameters. [see discussions by Byrne (2010); Kline (2016); Schumacker & Lomax (2016).
For these reasons, it is worth examining the normality of your data.
In this portion of the output, we have skewness and kurtosis statistics for each variable in your model. These are useful for
judging whether there is evidence of a departure from univariate normality. Keep in mind that univariate normality is a
necessary, but not sufficient condition for multivariate normality (Pituch & Stevens, 2016). Mardia’s index of multivariate
kurtosis, therefore, is provided at the bottom of this table to facilitate your judgment concerning potential multivariate non-
normality.
The ‘skew’ and ‘kurtosis’ values for each variable are used to evaluate our measures for univariate normality. The ‘c.r.’ columns
contain z-values for testing whether the univariate distribution for a variable departs significantly from normality with respect
to skew or kurtosis. [Note: These test results are impacted by sample size and may not be particularly useful in very large
samples]. Skewness values between -2 and +2 are reasonably consistent with normality (Lomax & Hahs-Vaughn, 2012; see also
Pituch & Stevens, 2016), whereas values > 3 (in absolute value) indicate more severe non-normality (see Kline, 2016). Rules of
thumb for evaluating kurtosis have been less consistent. Lomax and Hahs-Vaughn (2012) suggest values between -2 and +2 are
consistent with normality (see also Pituch & Stevens, 2016). However, values of 7 or 8 or greater have been suggested as
indicators of more severe non-normality with respect to kurtosis (see discussions by Byrne, 2010; Kline, 2016). [I would also
recommend examining the univariate distributions of your variables graphically, e.g., P-P plots or histograms]
The skewness and kurtosis values for the variables seemed to provide evidence for univariate normality.
Mardia’s index of multivariate kurtosis (see bottom of table):
Citing Bentler (2005), Byrne (2010) states that ‘in practice, multivariate kurtosis values > 5 are indicative of data that are
nonnormally distributed’ (p. 104). We see in our output that the Multivariate kurtosis is 8.733, which exceeds that threshold.
Additionally, Byrne (2010) states, ‘When the sample size s very large and multivariately normal, Mardia’s normalized estimate is
distributed as a unit normal variate…’ (p. 104). This means that you can reference the z-distribution to test whether your data
departs significantly from a normality [recall, + or -1.96 are thresholds when testing a null hypothesis assuming α=.05]. The c.r.
for Mardia’s index is z=14.975, once again suggesting that data departs significantly from multivariate normality.
Mahalanobis distance (D2) can useful for identifying potential multivariate outliers, or cases that are extreme ‘on two or
more variables, or a pattern of scores that is atypical’ (Kline, 2016, p. 73). The p1 columns in our output contains the p-
values from the cumulative distribution function (CDF) for a central chi-square distribution with df=number of variables
(see https://www.ibm.com/support/pages/how-are-probabilities-calculated-mahalanobis-distances-amos). If a computed
MD is statistically significant, then we consider the possibility of a case being a multivariate outlier. Kline (2016) and
Tabachnick and Fidell (2013) both suggest a conservative significance criterion or p=.001 for identifying potential outliers.
Pituch and Stevens (2016) also consider .001 as a conservative criterion for identification of outliers.
It is also possible to copy the table of MD values (for those cases furthest from the centroid) and paste it into Excel.
Since the MD values are rank-ordered, it is possible graph the MD values and identify potential ‘breaks’ or ‘gaps’ in MD
values that might suggest the presence of one or more multivariate outliers. For similar discussion, see Byrne (2010, p.
341).
  Keep in mind that AMOS does not print out Mahalanobis distance for all cases, but the main subset that are
  candidate outliers. So, if you have a very large sample size chances are not all cases will appear. But this process
  still works since the focus is potential outliers.
This portion of the output (left) are the computed unstandardized indirect effects in the model. The .000’s reflect indirect
effects that are not computed between two variables.
Importantly, the indirect effects in this output are total indirect effects (TIE). That is, if you have specified that a mediation
effect occurs via two or more mediators, then those indirect effects are summed to the TIE. If you want to obtain specific
indirect effects (SIE) – i.e., to separate out the indirect effects that make up a total indirect effect – you will need to use the
User-Defined Estimands. I do not cover that in this presentation; however, I cover this in two other videos:
Older video: https://www.youtube.com/watch?v=9jGL45NuVAA
Newest video (Jan 2021): https://youtu.be/gYE4yIjfFIA
We see that the indirect effect of neuroticism on burnout (which would have occurred via negative coping) was .015, whereas
the total indirect effect of neuroticism on anxiety (via the combination of burnout and negative coping) was .006. The indirect
effect of extraversion on burnout (via positive coping) was -.006, whereas the total indirect effect on anxiety (via the
combination of burnout and positive coping) was -.002. The indirect effect of negative coping on anxiety (via burnout)
was .012, whereas the indirect effect of positive coping on anxiety (via burnout) was -.007.
[Using the terminology laid out in the previous slide: The indirect effect of neuroticism on burnout is not only a specific
indirect effect, but also a total indirect effect. The same goes for the effect of extraversion on burnout.]
This portion of the output contains the bootstrap confidence interval results, which are used to test whether the indirect
effects (see previous slide) are significantly different from zero. As an example, the 90% bootstrap confidence interval for the
indirect effect of neuroticism on anxiety was (.001, .011). [Since the null of 0 does not fall between the lower and upper
bound of the interval, we reject the null and infer a non-zero population indirect effect]. As a second example, the 90%
confidence interval for the indirect effect of extraversion of anxiety was (-.004, -.001). [Again, since the null of 0 does not fall
between the lower and upper bound, we reject the null and infer a non-zero population indirect effect].
                                                      References
Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming (2nd ed.).
     New York: Routledge.
International Personality Item Pool: A Scientific Collaboratory for the Development of Advanced Measures of Personality
      Traits and Other Individual Differences (http://ipip.ori.org/). Internet Web Site.
Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). New York: The Guilford Press.
Lomax, R. G., & Hahs-Vaughn, D. L. (2012). An introduction to statistical concepts (3rd ed.). New York: Routledge.
Pituch, K. A., & Stevens, J. P. (2016). Applied Multivariate Statistics for the Social Sciences (6th ed.). New York: Routledge.
Schumacker, R. E., & Lomax, R. G. (2016). A beginner’s guide to structural equation modeling (4th ed.). New York:
    Routledge.
Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th edition). Pearson: Street, Upper Saddle River, New
     Jersey.
Whittaker, T. A. (2016). ‘Structural equation modeling’. Applied Multivariate Statistics for the Social Sciences (6th ed.).
     Routledge: New York. 639-746.