Linear Statistical Modelling
A printout of your solutions (either word-processed or produced using LATEX) should be handed in to the assignment slots in the statistics corridor by 15:00 on Thursday the 14th of March. Please remember to include only your ID number on your submission to allow anonymous marking. Please leave adequate time for printing and note that “the printer broke” is not a valid reason for late submission. If you have any queries about the coursework please post them on the ST221 forum, but do not post any part of your solutions. This assignment counts towards 15% of your final module mark.
Download the file courseworkData1.csv from the module webpage and read it into R. The dataset consists of measurements of 115 male participants in a study of body fat.
The variables are:
bodyfat The percentage of body fat.
agegrp Age group in three categories: 20-29 years; 30-39 years; ≥ 40 years.
height Height in metres (m)
weight Weight in kilograms (kg)
waist Waist circumference in centimetres (cm)
hip Hip circumference in cm.
The variable bodyfat variable is calculated from a technique called underwater weighing, in which the partipants are weighed twice: once in air and once while fully submerged in water. This technique requires specialist equipment and is time-consuming. It would be useful if bodyfat could be predicted from more easily available measurements.
A medical doctor suggests that a useful predictor of % body fat is the body mass index (BMI):
Working together, you propose a linear model:
bodyfat = α + β(BMI − 22) + s
where 22 kg/m2 represents an “ideal” BMI1.
Give physical interpretations of the parameters α and β. 
Produce a suitable graphical illustration of the relationship between body mass index and % body fat in the data. 
Fit the linear model and give estimated values for the parameters α and β. 
1The BMI range for a healthy weight is 18–25
Predict the % body fat of a man with height 1.78m and weight 82kg. 
Add the fitted line from the model to your plot.  (f) The doctor suggests that, as body fat tends to accumulate with age, the parameter α
may be different for different age groups. Write down the mathematical formula for
this revised model. 
Produce another graphical illustration of the relationship between BMI and % body fat that distinguishes three age groups. 
Add fitted lines for the three age groups to your new plot. 
Calculate an unbiased estimate of the variance of the errors. 
Another doctor suggests that you should not use BMI as a predictor, but should instead use the waist:hip ratio.
bodyfat = α∗ + β∗ + s∗
Again you refine this model by allowing α∗ to depend on age. Does this alternative model give better predictions of % body fat? Justify your answer. 
Download the file courseworkData2.csv from the module webpage and read it into R.
Fit a simple linear regression model in which y1 is the response and x1 is the explanatory variable (predictor). Produce a scatterplot of the data and add the fitted line to it. 
Produce a residual plot for your model and say whether it is acceptable or not. If it is not acceptable identify which of the model assumptions has been violated. 
Suggest what can be done to improve the fit of the model. Fit the improved model and again use the residual plot to identify any inadequacies. 
Fit a simple linear regression model in which y2 is the response and x2 is the explanatory variable (predictor). Report your parameter estimates. 
Produce a residual plot for this second model and say whether it is acceptable or not.
If it is not acceptable identify which of the model assumptions has been violated.  (f) Suggest what can be done to improve the fit of this model. Fit the improved model
and again use the residual plot to identify any inadequacies and improve the model.