What is Linear Regression?

Linear regression is a statistical technique to study the relationship between a continuous quantitative response variable and one or more categorical or quantitative predictors.

Motivation

To motivate our use of linear regression, let’s consider an example and some research questions we can asnwer using this technique. About 610,000 people die of cardiovascular heart disease (CHD) every year in the United States (that’s 1 in every 4 deaths!). Heart disease is the leading cuase of death for both men and women, so understanding the risk factors for heart disease and there relationships to each other is extremely important.

Body mass index (BMI) and low density lipoprotein (LDL) cholesterol are both known risk factor for CHD. It is reasonable to hypothesize that higher BMI leads to higher LDL, but this association may be depend on other factors such as age, ethnicity, smoking, statin use (statins are a class of drug used to lower cholesterol levels) and alcohol use. These other factors might also have a relationship with LDL cholesterol as well. Here are a few important questions that we might seek to address.

  1. Is there a linear relationship between LDL and the other variables? If not, can we transform them so that there is a linear relationship?

  2. Is there a relationship between LDL cholesterol and BMI or any of the other lifestyle and demographic variables, age, ethnicity, smoking, statin use or alcohol use.

  3. How strong is the relationship?

  4. Is increase BMI associated with higher LDL cholesterol levels?

  5. How large is the effect of BMI on LDL cholesterol levels?

  6. What would be the LDL cholesterol level for someone with given values of BMI and the other lifestyle and demographic variables? How accurate is this prediction?

  7. Is the effect of BMI on LDL cholesterol different between subjects that take statins and those that dont?

It turns out that all of these questions can be answered with linear regression. We will first disucss linear regression, adressing how to answer these questsions in general, and then return to these specific questions at the end.