Gentle introduction

Esteban Montenegro-Montenegro, PhD

Department of Psychology and Child Development

Explain relevant statistical concepts.

Describe and explain what is

*“norms”*.Perhaps more memes…

- There are many reasons to implement statistical models in measurement theory.
- We need to create statistical models that help to understand Nature.
- In order to understand Nature we need to create instruments and measurements.
- The best way to create measurements is to create a model where observation and theory are linked together. In this step, statistics will help us to evaluate a model or multiple measurement models.
- Statistics will help to make inferences based on our observed data.
- You can describe the performance of groups, or describe traits with aggregated data.
- You can compare individuals and groups based on scores.

To study and understand

*Nature*we must construct afor how*model**Nature*works.A model helps you to understand Nature and also allows you to make predictions about Nature. There is no right or wrong model; they are all wrong! But some are better than others.

- A
is a numerical characteristic of the data-generating process, one that is usually unknown but often can be estimated using data.*parameter*

For instance, in the model showed above, we have ** two unknown parameters**, this model produces data \(Y\). The unknown parameters are represented with greek letters, for instance the letter

**Mantra**

Model produces data.

Model has unknown parameters.

Data reduce the uncertainty about the unknown parameters.

- Nominal: numerical values representing labels or groups. For example: marital status, job position, gender.
- Ordinal: This scale allows you to rank individuals or objects but not to say anything about the meaning of the differences between the ranks. Likert scales are a good example.
- Ratio: it is a scale with a meaningful zero. For example the Kelvin scale to measure temperature.

You are probably familiar with frequencies. When reporting frequencies, we count how many times a value is repeated in your observed data.

In the next example I’ll show descriptive information related to a validation I did in Costa Rica.

My aim was to know how people felt when reading different adjectives and nouns. I asked to my participants to rate the perceived “arousal” and perceived “valance” of each word.

I utilized mannequins to evaluate valance:

- I used the following scale to measure arousal:

I didn’t create these scales, the original authors are Bradley & Lang (1994), an it was called The Self-Assessment Manikin. I thought it could be a good option to measure words’ positive and negative valence along with the “arousal”.

Each little individual scale had a number in my data set, the first little guy in each scale was coded as “9” and the last little individual was coded as “1”.

This means I utilized an

scale where 9 represents “very positive word”, and 1 represents “very negative word”. Likewise, in the arousal scale 9 represents “strongly activated” whereas 1 represents “calm or not activated”.*ordinal*

- In this example a person marked rated “Anxious” “1” in valance and “9” in activation.

Frequencies of arousal and valance | ||
---|---|---|

Rating of Anxiuos | ||

Response Options | Arousal | Valance |

1 | 41 | 29 |

2 | 4 | 7 |

3 | 13 | 13 |

4 | 10 | 8 |

5 | 11 | 23 |

6 | 2 | 3 |

7 | 3 | 6 |

8 | 1 | 3 |

9 | 8 | 4 |

NA | 7 | 4 |

**Question**

- Did the participants understand what is “anxiety”?
- Is it trivial to ask this question to ourselves?

You can describe a distribution in terms of percentiles.

Let’s imagine the folllowing numbers represent households income:

$25,500| $32,456| $37,668| $54,365| $135,456 |
---|

- Now, we can follow this formula to estimate our percentiles (Westfall & Henning, 2013):

- The little hat \(\hat{}\) on top of \(y\) means “estimate of”, this is used in statistics to comunicate that you are estimating a value from “data” (lowercase). This means you are estimating a value from your observed fixed data. The right-hand side is the \(ith\) ordered value of the data, all together we can read the formula as:

*The* \((i − 0.5)/n\) *quantile of the distribution is estimated by the* \(ith\) ordered value of the data

- We can see an example:

\(i\) | \(y(i)\) | (\(i\)-0.5)/\(n\) | \[\hat{y}_{(i-0.5)/n} = y(i)\] |
---|---|---|---|

1 | 25500 | (1-0.5)/5 =0.10 | 25500 |

2 | 32456 | (2-0.5)/5 =0.30 | 32456 |

3 | 37668 | (3-0.5)/5 =0.50 | 37668 |

4 | 54365 | (4-0.5)/5 =0.70 | 54365 |

5 | 135456 | (5-0.5)/5 =0.90 | 135456 |

Then, we can say in plain English: “The 70th percentile of the distribution is measured by $54,365”.

Now notice something, why we don’t have data representing the 75th percentile?

Given that these are

, these numbers are approximations to the true value, If you collect more data you’ll have data in different percentiles, also more precision to capture the real value.*estimates*

Source: CLICK HERE

The mean or average is the most frequent estimate in testing and statistics, along with the standard deviation.

Let’s see their anatomy

**Mean estimation:**

*Variance:*

*Standard Deviation:*

**Variance and SD**

The standard deviation and variance are similar, they both measure the distance from the mean, but the standard deviation is a standardized measure, instead variance is scale dependent.

*Z*Scores are used to compare observed scores versus ascore, or to compare scores from different scales with different metric.*normative*Let’s see an example where we need to compare two scores, but the scores have different range of values:

Variable |
N = 175 |
---|---|

depressionTotal | |

Mean (SD) | 10 (6) |

Min, Max | 0, 34 |

anxietyTotal | |

Mean (SD) | 4.1 (3.3) |

Min, Max | 0.0, 17.0 |

ruminationTotal | |

Mean (SD) | 30 (8) |

Min, Max | 14, 52 |

We’ll need to compute

*Z*scores in these cases using the following transformation:\[Z = \frac{X_{i}-\bar{X}}{SD}\]

In this transformation we estimate the distance from the mean (\(\bar{X}\)) for each observed value \(X_{i}\), and then we divide it by the score’s standard deviation (\(SD\)).

- After tranforming the observed values, we can now compare distributions:

Variable |
N = 175 |
---|---|

depreZ | |

Mean (SD) | -0.01 (1.00) |

Min, Max | -1.65, 3.72 |

anxietyZ | |

Mean (SD) | 0.00 (1.02) |

Min, Max | -1.26, 3.97 |

rumiZ | |

Mean (SD) | 0.00 (1.00) |

Min, Max | -1.93, 2.59 |

Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: The self-assessment manikin and the semantic differential. *Journal of Behavior Therapy and Experimental Psychiatry*, *25*(1), 49–59.

Westfall, P. H., & Henning, K. S. (2013). *Understanding advanced statistical methods*. CRC Press Boca Raton, FL, USA: