Part 1

Esteban Montenegro-Montenegro, PhD

Psychology and Child Development

To introduce correlation to estimate relationships between two variables.

To introduce the notion of covariance.

To study scatter plots to visualize correlations.

A

is a numerical index that reflects the relationship between two variables. The value of this descriptive statistic ranges between -1.00 and +1.00.*correlation coefficient*A correlation between two variables is sometimes referred to as a bivariate (for two variables) correlation

- At the beginning we will study the correlation named
*Pearson product-moment*. - There other types of correlation estimation depending on the data generating process of each variable.
*Pearson product-moment*deals withDATA.*continuous*

Salkind & Shaw (2020):

Salkind & Shaw (2020):

A correlation can range in value from \(-1.00\) to \(+1.00\).

A correlation equal to 0 means there is no relationship between the two variables.

The absolute value of the coefficient reflects the strength of the correlation. So, a correlation of \(-.70\) is stronger than a correlation of \(+.50\). One frequently made mistake regarding correlation coefficients occurs when students assume that a direct or positive correlation is always stronger (i.e., “better”) than an indirect or negative correlation because of the sign and nothing else.

A negative correlation is not a “bad” correlation.

We will use the letter

*r*to represent correlation. For example \(r= .06\).

\(r_{xy}\) is the correlation coefficient.

\(n\) is the sample size.

\(X\) represents variable \(X\).

\(Y\) represents variable \(Y\).

\(\Sigma\) means summation or addition.

Salkind & Shaw (2020):

- You will find a correlation matrix in publications.
- It is the best way to represent several correlations between different pairs of variables.

- You will notice that a a correlation matrix has 1.00 on the diagonal and two “triangles” with the same information.

- There is a useful trick, you could square your \(r\) and get a measure of correlation in terms of percentage of shared variance:

What is the coefficient of determination in this case?

We just need to estimate $ r^2= -0.22^2 = -0.05\(. Attention and depression shared only 5\)%$ of the variability (variance).

I have shown you several plots, these plots are called

.*scatter plots*These plots are useful to explore visually possible correlations.

When you create this plots, you only need to represent one of the variables in the x-axis and the other variable will be represented in the y-axis.

- We can check some values and see what is happening, like case #78, in the next plot:

- Maybe if we add the line of best fit we will see it better:

Can you spot the direction of this correlation?

This data come from a questionnaire that asks to rate how emotional you feel. For instance, it asks: Rate how GREAT you feel where 1 = “not feeling” to 6=“I strongly feel it”.

- Let’s add again the line of best linear fit:

- Let’s add the line of linear fit:

- When the correlation is high, it means there is a large portion of shared variance between \(x\) and \(y\).
- When the correlation is high all the values will converge towards the line of best linear fit.
- When the correlation is low, the values will be sparse and far from the line of best fit.
- A flat linear line means that there is not correlation between \(x\) or \(y\) or the correlation is remarkably low. This means \(r=0\) or closer to zero.

- In
`R`

you can estimate Pearson correlations using the function`cor()`

as showed here:

In this estimation, I’m calculating the correlation between the emotion DOWN and the emotion GREAT. The Pearson correlation was \(r= -0.36\). Is this a strong correlation?

- We could follow an ugly rule of thumb, but be careful, these are not rules cast in stone (Salkind & Shaw, 2020):

Salkind, N. J., & Shaw, L. A. (2020). *Statistics for people who (think they) hate statistics: Using r*. Sage publications.