The correlation coefficient is
denoted by r, and -1 ≤ r ≤ 1. Consider a set of ordered pairs.
If the graph of the points defined by the ordered pairs is a
line with positive slope, then r = 1.
If the graph of the points defined by the
ordered pairs is a line with negative slope,
then r = –1. If the
graph of the set of ordered pairs is a “cloud”
with no discernible pattern, then r ≈ 0.
Correlation measures the strength of the linear relationship
between two sets of data. If a correlation is positive, as one
set of data increases, the other set of data increases. If a
correlation is negative, as one set of data increases, the other
set of data decreases. If the correlation coefficient is fairly
close to 1 or –1, it is reasonable to perform a linear
regression in order to predict the second variable knowing the
first. There is no specific rule for when it is reasonable, but
the regression will be a better predictor for r value close to 1
or –1.
To perform a simple linear regression is to find
the linear equation that represents the
relationship between one variable (y), and another variable (x).
If the correlation is strong, the regression equation is
effective in predicting the y-value given an x-value.
Correlation vs. Causation
Causation: one event occurs because another event occurred. For
example, a person earns a paycheck of $400 because he worked 40
hours at a rate of $10 per hour.
Correlation: Just because two events happen simultaneously does
not mean that one event causes the other. For example, on a hot
and sunny
summer day, the sales of sun glasses and ice cream increase significantly.
On cold rainy fall days, the number of purchases
of these items plummet.
Is it reasonable to think that sunglass sales cause ice cream
sales? Do people buy ice cream to celebrate their puchase of sun
glasses?
Often, as in this case, there is an additional factor that
impacts both variables. Here, that factor was a
hot and sunny day.
10 Correlations That Are Not Causations (read #7 and #6)
|