Choosing Regression Models
Use the table provided for Question 1 in the Excel file and answer the following questions:
Question 1
Using Excel’s trendline feature, find which type of model best fits the data, going by the r-value.
Find the equation of the trendline and use it to predict the value of Y corresponding to an X-value of 2.5.
Question 2
Rhudy and France (2007)* studied the relation between obesity and the response to pain. Obesity was measured as the percentage over the ideal weight, and the response to pain as the nociceptive flexion reflex threshold. The results of the study appear in the table, labeled Question 2 in the Excel file. Based on that, answer the following questions:
According to the scatter plot, which model explains better the relation of the response to pain on obesity: linear, exponential, quadratic, or logarithmic?
According to the best regression model, what is the response to pain expected for a person with obesity of 50%? Is this prediction reliable?
According to the best regression model, what is the expected pain threshold for a person with obesity of 4.5%? Is this prediction reliable?
Question 3
In the second sheet of the Excel file, you find a large data set comprised of two columns of data from the famous Framingham Heart Study*. The first column lists the systolic BP of the patients. The second shows a 1 if they developed coronary heart disease within the next ten years and a 0 if they did not.
Create a scatter plot of the data in Excel. Based on the scatter plot alone, does the (logistic) correlation between SBP and 10-year CHD appear strong or weak?
Use the XLMiner app to determine the logistic correlation for the data. Does the p-value suggest a strong or weak correlation?
Why do you think that the correlation looks weak in the scatter plot, when in fact the correlation is strong?
Use the output from Excel to determine the equation of the best-fit logistic curve. You may want to revisit the video (Links to an external site.) to see how to do this.
Use the equation to estimate the probability of a patient developing CHD in the next ten years if their SBP is 133. You may need to use e≈2.718.