Department of Psychology and Child Development
At this point you probably remember the notion of linearity in the Classical Regression Model.
It is a important assumption in the Classical Regression Model, but it is not a strict assumption. This is a good characteristic of the linear model.
Multiple phenomena in the universe are not linear such as growing curves, or developing curves in human beings are many times not linear at all.
For example, we could check a case where we select the wrong model for the data, like the one showed below:
\[Y = \beta_{0} + \beta_{1}X_{1} + \epsilon\]
\[Y = \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{1}^2 + \epsilon\]
Regression models are flexible in terms of interactions.
But, what is an interaction?: An interaction is what we call a moderated effect. Let’s see how it looks like:
You can see in (Figure 1) there is a third variable Z affecting the relationship between X and Y. When this happens we say that the relationship between X and Y DEPENDS on Z
When we analyze an interaction we are estimating the product of a multiplication.
We can represent the Figure 2 in its mathematical form:
\[Y= \alpha + \beta_{1}X + \beta_{2}Z + \beta_{3}XZ + \epsilon_{i}\]
library(haven)
library(parameters)
#### continuous moderation
## stress:
## socsup: social support
## dep: depression
## stressXss: interaction stress * social support
##
url <- "https://raw.githubusercontent.com/blackhill86/mm2/refs/heads/main/dataSets/moderationExample.csv"
stress <- read.csv(url)
### Regression model without interaction
fitModel_1 <- lm(dep ~ socsup + stress,
data=stress)
model_parameters(fitModel_1,
verbose = FALSE)
Parameter | Coefficient | SE | 95% CI | t(138) | p
-------------------------------------------------------------------
(Intercept) | 12.75 | 2.52 | [ 7.77, 17.74] | 5.06 | < .001
socsup | -0.43 | 0.12 | [-0.66, -0.20] | -3.71 | < .001
stress | 0.13 | 0.03 | [ 0.08, 0.19] | 4.87 | < .001
In the model above, I’m testing if social support and stress are related to depression.
The model looks like this:
\[Y= \alpha + \beta_{1}Stress + \beta_{2}SocialSupport + e_{i}\]
\[Y= \alpha + \beta_{1}Stress + \beta_{2}SocialSupport + \beta_{3}Stress*SocialSupport + e_{i}\]
Parameter | Coefficient | SE | 95% CI | t(137) | p
---------------------------------------------------------------------------
(Intercept) | 1.81 | 4.77 | [-7.63, 11.25] | 0.38 | 0.705
socsup | 0.11 | 0.23 | [-0.35, 0.57] | 0.48 | 0.635
stress | 0.48 | 0.13 | [ 0.22, 0.75] | 3.63 | < .001
socsup × stress | -0.02 | 6.47e-03 | [-0.03, 0.00] | -2.68 | 0.008
The interaction term is not explained by chance because \(p= 0.008\) which is less than \(0.05\).
If you find that a interaction terms is not explained by chance, you CANNOT interpret the individual terms because they are MODIFIED or DEPEND on Z.
The interpretation can be a little bit tricky. First, we need to understand the relation between depression and stress at each level of our moderator.
In our example, I selected social support as our moderator:
library(marginaleffects)
library(ggplot2)
### WE can use the package marginaleffects to
### generate slopes at different levels of
### the moderator
plot_predictions(fitModel_2,
condition = list("stress",
"socsup" = "threenum" ),
points = 0.5) +
theme_classic()+
ggtitle("Relationship between Stress and Depression at Social Support")+
theme(plot.title = element_text(hjust = 0.5))
It is also named pick-a-point or spotlight analysis.
In this approach we can select any value from our moderator, and test the conditional effect of \(X\) on \(Y\).
I apologize but you will see some math here:
\[Y= \alpha + \beta_{1}X + \beta_{2}Z + \beta_{3}XZ + \epsilon_{i}\] - We can reorder terms to get back to this:
\[Y= \alpha + (\beta_{1} + \beta_{3}Z)X + \beta_{2}Z + \epsilon\]
\[f(Z)=\beta_{1} + \beta_{3}Z\] \(f(Z)\) is actually our simple slope. By plugging different values of \(Z\) into \(f(Z)\), we get the slope of the conditional effect of \(X\) on \(Y\) at the chosen value of \(Z\).
The most popular choices to plug in here: \(f(Z)=\beta_{1} + \beta_{3}Z\), are \(+1SD\), the mean, and \(-1SD\) corresponding to our moderator \(Z\).
We don’t have to estimate these simple slopes “by-hand”, we can estimate different regression models after centering at the critical values we want to test.
Centering transforms a variable by subtracting a constant (e.g., the variable’s mean) from each observation of the variable.
The most familiar form of center is mean centering We can center on any value.
When probing interactions, we can center \(Z\) on the interesting values we choose during the pick-a-point approach.
Running the model with \(Z\) centered on specific values automatically provides tests of the simple slope conditional on those values of \(Z\).
-In the following R
chunk, I’ll create three new variables in the object stress
which contains my data information. The three new variables are:
library(patchwork)
atMean <- ggplot(aes(x=socialMean), data = stress) +
geom_histogram(colour = 1,
fill = "white",
binwidth = 1) +
theme_classic()
atHigh <- ggplot(aes(x=socialHigh), data = stress) +
geom_histogram(colour = 1,
fill = "white",
binwidth = 1) +
theme_classic()
atLow <- ggplot(aes(x=socialLow), data = stress) +
geom_histogram(colour = 1,
fill = "white",
binwidth = 1) +
theme_classic()
atMean/(atHigh+atLow)
With our new computed variables we can now test our simple slopes, we will estimate a regression model for each simple slope. This will help us to test if the simple slope is explainable by chance.
First simple slope at the mean:
Parameter | Coefficient | SE | 95% CI | t(137) | p
-------------------------------------------------------------------------------
(Intercept) | 3.99 | 1.02 | [ 1.97, 6.01] | 3.91 | < .001
stress | 0.14 | 0.03 | [ 0.09, 0.19] | 5.24 | < .001
socialMean | 0.11 | 0.23 | [-0.35, 0.57] | 0.48 | 0.635
stress × socialMean | -0.02 | 6.47e-03 | [-0.03, 0.00] | -2.68 | 0.008
When estimating a model with a centered moderator, the slope of your predictor, in this case stress becomes your estimated simple slope. In our example the simple slope at the mean of social support is: \(\beta=0.14\), and it is not explained by chance because the p-value is less than 0.05.
Second simple slope at high values of social support:
Parameter | Coefficient | SE | 95% CI | t(137) | p
------------------------------------------------------------------------------
(Intercept) | 4.48 | 1.39 | [ 1.74, 7.22] | 3.23 | 0.002
stress | 0.06 | 0.04 | [-0.01, 0.14] | 1.73 | 0.086
socialHigh | 0.11 | 0.23 | [-0.35, 0.57] | 0.48 | 0.635
stress × socialHigh | -0.02 | 6.47e-03 | [-0.03, 0.00] | -2.68 | 0.008
In the case of the simple slope at the high level of social support is \(\beta = 0.06\), and it is explained by chance because the p-value is greater than \(0.05\).
Third simple slope at low values of social support:
Parameter | Coefficient | SE | 95% CI | t(137) | p
------------------------------------------------------------------------------
(Intercept) | 3.50 | 1.51 | [ 0.50, 6.49] | 2.31 | 0.022
stress | 0.22 | 0.04 | [ 0.14, 0.30] | 5.26 | < .001
socialLow | 0.11 | 0.23 | [-0.35, 0.57] | 0.48 | 0.635
stress × socialLow | -0.02 | 6.47e-03 | [-0.03, 0.00] | -2.68 | 0.008
In our model depression is our outcome, stress our predictor, and social support our moderator. After estimating our simple slopes at different levels of our moderator we can conclude the following:
The effect of stress on depression is stronger when participants have a low level of social support: \(\beta = 0.22\).
The effect of stress on depression gets weaker when participants are close to the mean of social support: \(\beta= 0.14\).
The effect of stress on depression gets much weaker when participants have a high score in social support \(\beta= 0.11\). This simple slope is explained by chance. This means that the relationship between depression and stress is explained by chance when participants have a high social support.
You may be thinking: what happens if I want to test all the possible values of my moderator?
The answer is the Johnson-Neyman technique.
The Johnson-Neyman technique is an alternative approach that removes the arbitrary choices necessary for pick-a-point.
Johnson-Neyman finds the region of significance wherein the conditional effect of \(X\) on \(Y\) is statistically significant.
Inverts the pick-a-point approach to find what cut-points on the moderator correspond to a critical \(t\) value for the conditional \(\beta_{1}\).
This requires to solve a set of equations that I’m not explaining in this lecture but, the simple message is that Johnson-Neyman’s strategy can give you a region of significance accross values of your moderator.
R
The Johnson-Neyman technique is implemented in the R
package rockchalk
. There are more packages performing this strategy.
You first need to estimate your moderated regression model, this means a model with at least one interaction term.
Secondly you can estimate the simple slopes, and the region of significance.
JOHNSON-NEYMAN INTERVAL
When socsup is OUTSIDE the interval [23.78, 50.44], the slope of stress is
p < .05.
Note: The range of observed values of socsup is [8.00, 28.00]
SIMPLE SLOPES ANALYSIS
Slope of stress when socsup = 15.24283 (- 1 SD):
Est. S.E. t val. p
------ ------ -------- ------
0.22 0.04 5.26 0.00
Slope of stress when socsup = 19.68794 (Mean):
Est. S.E. t val. p
------ ------ -------- ------
0.14 0.03 5.24 0.00
Slope of stress when socsup = 24.13306 (+ 1 SD):
Est. S.E. t val. p
------ ------ -------- ------
0.06 0.04 1.73 0.09
This lecture was created based on the materials created by Kyle M. Lang (2016) and Mauricio Garnier-Villareal (2020)