Gentle introduction
Department of Psychology and Child Development
Explain relevant statistical concepts.
Describe and explain what is “norms”.
Perhaps more memes…
To study and understand Nature we must construct a model for how Nature works.
A model helps you to understand Nature and also allows you to make predictions about Nature. There is no right or wrong model; they are all wrong! But some are better than others.
For instance, in the model showed above, we have two unknown parameters, this model produces data \(Y\). The unknown parameters are represented with greek letters, for instance the letter beta in the example above.
Mantra
Model produces data.
Model has unknown parameters.
Data reduce the uncertainty about the unknown parameters.
You are probably familiar with frequencies. When reporting frequencies, we count how many times a value is repeated in your observed data.
In the next example I’ll show descriptive information related to a validation I did in Costa Rica.
My aim was to know how people felt when reading different adjectives and nouns. I asked to my participants to rate the perceived “arousal” and perceived “valance” of each word.
I utilized mannequins to evaluate valance:
I didn’t create these scales, the original authors are Bradley & Lang (1994), an it was called The Self-Assessment Manikin. I thought it could be a good option to measure words’ positive and negative valence along with the “arousal”.
Each little individual scale had a number in my data set, the first little guy in each scale was coded as “9” and the last little individual was coded as “1”.
This means I utilized an ordinal scale where 9 represents “very positive word”, and 1 represents “very negative word”. Likewise, in the arousal scale 9 represents “strongly activated” whereas 1 represents “calm or not activated”.
Frequencies of arousal and valance | ||
---|---|---|
Rating of Anxiuos | ||
Response Options | Arousal | Valance |
1 | 41 | 29 |
2 | 4 | 7 |
3 | 13 | 13 |
4 | 10 | 8 |
5 | 11 | 23 |
6 | 2 | 3 |
7 | 3 | 6 |
8 | 1 | 3 |
9 | 8 | 4 |
NA | 7 | 4 |
Question
You can describe a distribution in terms of percentiles.
Let’s imagine the folllowing numbers represent households income:
$25,500| $32,456| $37,668| $54,365| $135,456 |
---|
The \((i − 0.5)/n\) quantile of the distribution is estimated by the \(ith\) ordered value of the data
\(i\) | \(y(i)\) | (\(i\)-0.5)/\(n\) | \[\hat{y}_{(i-0.5)/n} = y(i)\] |
---|---|---|---|
1 | 25500 | (1-0.5)/5 =0.10 | 25500 |
2 | 32456 | (2-0.5)/5 =0.30 | 32456 |
3 | 37668 | (3-0.5)/5 =0.50 | 37668 |
4 | 54365 | (4-0.5)/5 =0.70 | 54365 |
5 | 135456 | (5-0.5)/5 =0.90 | 135456 |
Then, we can say in plain English: “The 70th percentile of the distribution is measured by $54,365”.
Now notice something, why we don’t have data representing the 75th percentile?
Given that these are estimates , these numbers are approximations to the true value, If you collect more data you’ll have data in different percentiles, also more precision to capture the real value.
Source: CLICK HERE
The mean or average is the most frequent estimate in testing and statistics, along with the standard deviation.
Let’s see their anatomy
Mean estimation:
\[\begin{align} \bar{X} &= \frac{\sum X}{n}\\ &= \frac{18+21+24+23+22+24+25}{7}\\ &= \frac{157}{7}\\ &= 22.43\\ \end{align}\]Variance:
\[\begin{equation} s^2 = \frac{\sum (X_{i} - \bar{X})^2}{n-1} \end{equation}\]Standard Deviation:
\[\begin{equation} s = \sqrt{\frac{\sum (X_{i} - \bar{X})^2}{n-1}} \end{equation}\]Variance and SD
The standard deviation and variance are similar, they both measure the distance from the mean, but the standard deviation is a standardized measure, instead variance is scale dependent.
Z Scores are used to compare observed scores versus a normative score, or to compare scores from different scales with different metric.
Let’s see an example where we need to compare two scores, but the scores have different range of values:
Variable | N = 175 |
---|---|
depressionTotal | |
Mean (SD) | 10 (6) |
Min, Max | 0, 34 |
anxietyTotal | |
Mean (SD) | 4.1 (3.3) |
Min, Max | 0.0, 17.0 |
ruminationTotal | |
Mean (SD) | 30 (8) |
Min, Max | 14, 52 |
We’ll need to compute Z scores in these cases using the following transformation:
\[Z = \frac{X_{i}-\bar{X}}{SD}\]
In this transformation we estimate the distance from the mean (\(\bar{X}\)) for each observed value \(X_{i}\), and then we divide it by the score’s standard deviation (\(SD\)).
Variable | N = 175 |
---|---|
depreZ | |
Mean (SD) | -0.01 (1.00) |
Min, Max | -1.65, 3.72 |
anxietyZ | |
Mean (SD) | 0.00 (1.02) |
Min, Max | -1.26, 3.97 |
rumiZ | |
Mean (SD) | 0.00 (1.00) |
Min, Max | -1.93, 2.59 |