Final Exam Summer 2025

Description

In this exam, which is more like a practice exercise; you’ll have to solve several applied tasks. This means you’ll have to use Jamovi in some parts to analyze data sets. In other parts, you’ll need to answer theoretical questions based on the slides provided in class.

If you find a more creative way to answer an exercise please do it, don’t limit yourself! There are multiple ways to answer to a statistical dilemma, and your creativity is totally encouraged!

You may also use R to answer this “exam”.

Part 1: Let’s check some important concepts

Please copy the question along with your answer in a Word document. You may also use Quarto to earn extra points. 4 points each question.

Note

You may watch this video to complete the second part.

A true experiment selects participants randomly, and assign participants randomly to different conditions :

True
False

Qualitative research collects numerical values to reject the null hypothesis :

True
False

Mario needs to know if there is a causal relationship between sugar intake and high BMI (Body Mass Index). What type of design should Mario conduct?

Interview
Experiment
Qualitative design
All the above are true

Select the option that will help you to collect counterfactual evidence in an experiment:

Control group
Waiting list
Recruit participants online
Options 1 and 2 are correct

If you estimate a correlation between two variables, you can also assume causation:

True
False

Write the three requirements of a causal relationship. HINT: Check this lecture

The following figure is an example of:

Error variance
Moderation
Correlation
Mediation

In the lecture titled: “Introduction to probability and statistics” , I showed the picture below. What is the meaning of this picture following slide 3 ?

Select only the TRUE statement:

Models are perfect representations of Nature
Data produces models
Models produce data, data does not produce models
All statistical models are perfectly correct

Parameters in a statistical model are unknown information, we collect data to reduce the uncertainty in the parameters:

True
False

The following expression represents a probabilistic model: (25 points)

\[\begin{equation} Y \sim p(y) \end{equation}\]

True
False

The reduction in uncertainty about model parameters that you achieve when you collect data is called statistical inference:

True
False

The following histogram shows a continuous distribution:

library(palmerpenguins)


Attaching package: 'palmerpenguins'

The following objects are masked from 'package:datasets':

    penguins, penguins_raw

hist(penguins$body_mass_g, 
     breaks = 50, 
     main = "Penguins' body mass",
     xlab = "grams")

True
False

Diane needs to estimate a Classical Regression Model. Diane believes that ethnicity has an effect on income. Diane asked for four ethnicity categories in her survey: white, black, Hispanic/Latino, and other. How many dummy coded variables does she need to compute to add ethnicity as a predictor of income?

Eleven
Ten
Three
Diane does not need dummy coded variables in this case

We can consider t-test and the Classical Regression Model as part of the General Linear Model:

True
False

In the following correlation matrix, is the correlation (\(r\) = -0.053) between attention and rumZ explained by chance alone? (25 points)

True
False

What type of plot is the following example?

Histogram
Box plot
Probability Density Function
Bar plot

According the following table, is the mean difference between groups explained by chance alone? If not, how do you know?

Second Part: Hands on real data…as always.

In this second part, you may use JAMOVi or R to answer each question. I’ll provide data sets that you will open in JAMOVI or R to answer each question.

In this exercise we will estimate a One-way ANOVA. One-way means that we will use only one predictor or grouping variable to reject the null hypothesis. The data set that we will use is the data file named ruminationClean.csv. You may download the file from this link CLICK HERE

You can also copy the following code to open the file from my personal repository. This method doesn’t need to download any file. This method only works if you are using R:

url <- "https://raw.githubusercontent.com/blackhill86/mm2/main/dataSets/ruminationClean.csv"
ruminate <- read.csv(url)

In our One-way ANOVA we will use the variable “Booklet” as our grouping variable. Booklet is a variable that has three numbers: 1,2 and 3. These numbers corresponds to different versions of the same survey, in each version the depression items were presented in different locations. In booklet 1, depression items were presented at the beginning of the survey, in booklet 2, depression items were presented in the middle of the survey, and finally in booklet 3, depression questions were presented at the end of the survey.

Why did I sort the depression items into three positions? This is appropriate when your survey is very long, and you believe people will be exhausted by the end of the survey. When participants are tired, they tend to give quick answers without thinking carefully. To avoid the effect of tiredness, you can create different versions of the same survey, assign the surveys randomly, and then analyze if there was an effect of tiredness in the depression score.

Important

I used the word “booklet” to name the column because this study was done long time ago. In that moment, online surveys were not often used in Costa Rica. Each participant had to answer a paper and pencil version of the survey.

Follow the next steps:

Open the file ruminationClean.csv in JAMOVI.
Click on ANOVA in the top bar.
Select “Booklet” variable as your grouping variable.
Select “depreZ” (depression) as your dependent variable.

After following the previous steps, answer the following questions:

What is the null hypothesis in this ANOVA analysis? (10 points)
Create a box plot of depreZ by Booklet interpret the plot (10points)

boxplot(depreZ~ Booklet,
       data=ruminate, 
      main="Standardized depression score by booklet",
      xlab="Booklet Number", 
      ylab="Depression Score (standardized)")

Report and interpret the test of assumptions: homogeneity of variance, normality test, and Q-Q plot. (40 points)
Report your ANOVA results in a table. (40 points)
Were the mean differences in depression by booklet version explained by chance? (38 points)
Perform a post-hoc analysis to determine which means were different beyond chance. (38 points). Copy and paste the table in this answer.
What is the implication of the result? Was the depression score different depending on the presentation order of the items? (22 points)