Assignment

240 points

Author

Esteban Montenegro-Montenegro

Published

May 21, 2025

Description

In this third assignment, you’ll have to solve several applied tasks. This means you’ll have to use R in some parts to analyze data sets. In other parts, you’ll need to answer theoretical questions based on the slides provided in class. I will also provide the book chapters on Canvas. When I need you to pay attention to a detail in a book chapter, I’ll mention the chapter and page where you can find the detail I need you to pay attention.

Part 1: Review important concepts

In this section, you will see TRUE/FALSE questions, or multiple choice questions. Select only one option. (10 points each correct answer).

Please copy the question along with your answer in a Word document.


  1. A \(t\)-test may be used when my dependent variable is continuous and the independent variable is a nominal variable with only two levels:



  1. The options to label A and B are: (check the module "Data Visualizations on Canvas")




  1. The above figure is a QQ-plot, can you conclude that Life Expectancy in Costa Rica comes from a normally distributed process? (check the module "Data Visualizations on Canvas")



  1. The above figure represents a:





  1. In the Classical Regression Model, the estimated slope of the regression is often represented with the Greek letter \(\beta\) (beta). Taking this into consideration, if my estimated \(\beta\) is equals to a value greater than \(0\), I can conclude that there is a positive relationship between my outcome and my predictor:



  1. The Classical Regression Model does not have any distributional assumption on the predictors



  1. The Classical Regression Model assumes that the dependent variable or outcome must come from a normal distribution:



  1. The following model represents:

\[\begin{equation} Y = \beta_{0}+ \beta_{1}X_{1} + \beta_{2}X_{2} + \epsilon \end{equation}\]






  1. One of the assumptions of ANOVA is constant variance in all groups:



  1. Pedro performed an experiment aiming to delay dementia symptoms in older adults. In this study, Pedro created three random conditions: A) A group attending yoga class, B) A group attending weight lifting, and finally, C) A group attending nutrition classes. Pedro measured the working memory capacity at the end of the study as a dependent variable. The problem for Pedro is that he doesn’t know which statistical model he should estimate.

What model should Pedro estimate on his data to test the mean differences between groups?






  1. According to the figure above, the Classical Regression Model assumes that the conditional probability distribution functions \(p(y|x)\) (y given x) are normal distributions:




  1. Again, following the figure above, we can say that the dashed line represents the linearity assumption:




  1. We can include nominal variables as predictors in a Classical Regression Model, but first we need to create dummy coded variables:



  1. The Classical Regression Model requires a continuous dependent variable:



  1. The following table represents:






Second Part: Short answer questions

In this section, I expect short answers, where you explain in your own words relevant concepts:

  1. Explain in your own words how to transform a nominal variable into several dummy coded variables. You may provide an example to make your explanation easier to follow. (20 points)

  2. Describe the assumptions of the Classical Regression Model, you may check the lecture Correlation and Regression Models Part 2. (15 points)

Third Part: Playing with R and perhaps JAMOVI.

In this part you will use R to answer several questions, the questions are a guide into different steps when performing an ANOVA.

You will use the data file named anovaData.csv, you may download the file from here.

You may also open the data file in R after running the following code:

url <- "https://raw.githubusercontent.com/blackhill86/mm2/refs/heads/main/dataSets/anovaData.csv"

camcog <- read.csv(url)

The data for this exercise is real data from an experimental intervention performed in Costa Rica in 2011. In this study, we aimed to answer the following question:

Will an intervention focus on improving autobiographical memory delay dementia symptoms?

Based on this question, we design an experiment where we assign aging adults to 4 different conditions:

  • Condition A: Participants with mild cognitive impairment received the intervention.

  • Condition B: Participants with mild cognitive impairment did not receive the intervention.

  • Condition C: Healthy participants without cognitive impairment received the treatment.

  • Condition D: Healthy participants without cognitive impairment did not receive the treatment.

We measured the cognitive performance of the participants as dependent variable. The cognitive performance was measured before the intervention started, and it was measured again once the intervention finished. We utilized the Cambridge Cognition Examination (CAMCOG) to determine their cognitive performance. This is a long battery of tests, and at the end you can compute a total score that represents the cognitive status.

Higher scores in the CAMCOG are evidence of better cognitive performance, while lower scores represent a low cognitive performance.

After this explanation I hope the content of the data is starting to make sense. In the data set anovaData.csv you will find the following columns:

  • ID: The study identification number for each participant.

  • CAMCOG_pre: the total score before the intervention started.

  • CAMCOG_post: the total score after the intervention finished.

  • Group: The intervention group where:

    • A: Participants with mild cognitive impairment received the intervention.
    • B: Participants with mild cognitive impairment did not receive the intervention.
    • C: Healthy participants without cognitive impairment received the treatment.
    • D: Healthy participants without cognitive impairment did not receive the treatment.

  1. Report the means and standard deviations of the CAMCOG_post by Group, report the table from R. In addition, create a plot in R showing the mean values of CAMCOG_post by Group. (20 points)

  2. Interpret the estimated mean values. Do you see a large difference in the mean score when comparing healthy aging adults versus aging adults with dementia? Do you observe in the bar plot a remarkable difference between groups A and B? ( 5 points)

  3. ANOVA has two important assumptions, constant variance and normality. Perform the Shapiro-Wilk’s test, to asses the normality assumption, and the Levene’s test to evaluate the assumption of constant variance. (10 points).

  4. Perform an omnibus ANOVA analysis to determine if there is any mean difference not explained by chance alone. Paste the table from R in your answer (10 points).

  5. According to the \(p\)-value. What is your conclusion? (5 point)

  6. At this point we have performed a omnibus test, this test does not tell which group differs beyond chance from the other groups. Perform a post-hoc analysis to determine where are the differences between groups and which differences are not explained by chance alone. Report the table generated in R. After that, interpret the results. (5 points)