Saturday, October 25, 2014

Five things to know about Pearson r

The final topic for Chapter 11 is the Pearson r coefficient. You won’t be asked to calculate this statistic, but you will need basic conceptual understanding of it. You’ll need to supplement this blog entry with the textbook because I can’t include diagrams here.

1. The general idea
The Pearson coefficient (often referred to as “r”) is a measure of bivariate correlation. This means it measures the strength of a relationship between two variables. It does NOT measure causation (remember there are three criteria for causation and correlation is only one).

For instance, it seems intuitive enough that there is a positive relationship between the variables annual income and years of education (the more money you make, the more education you likely to have). Therefore we’d expect a Pearson coefficient to indicate a strong relationship between these two variables. By contrast, it’s hard to imagine that there’s a relationship between the variables eye color and income. It doesn’t make sense that the two have anything to do with one another. In this case, we’d expect our Pearson coefficient to indicate either a weak or nonexistent relationship.

Now let’s talk about specifics.

2. The coefficient
The Pearson coefficient ranges from -1 to +1. The closer the value is to -1 or to +1, the stronger is the relationship between variables. Negative and positive values that are close to “0” indicate a weak relationship between variables. You’ll recall that we have two kinds of relationships between variables, negative and positive. Those relationships are reflected in the Pearson coefficient, which is why both -1 and +1 indicate a “strong” relationship.

3. Type of variable
Pearson r can only be used to measure variables at the ratio level of measurement. Nominal and ordinal variables are null and void. The short and sweet of it is that the Pearson coefficient relies on a calculation of the mean. And as you already know, the mean can only be calculated for ratio level variables. So, a red flag should go up if you’re asked to interpret a Pearson coefficient for the variables age and gender. This would be an invalid use of Pearson r because gender is a nominal variable.

4. Type of Relationship
Pearson r can only be used to measure linear relationships. Curvilinear relationships make the Pearson statistic null and void (the curvilinear relationship between the variable may be real, it’s just that the Pearson statistic cannot be used to measure or evaluate it). In lecture, I used “income” as an example of a curvilinear relationship. I said that income over a lifetime is not a straight line; most people make no money as a child, lots of money in their prime, and then minimal income after retirement. If you can imagine plotting that relationship on an x/y graph, you’d have a curve. Another curvilinear relationship is between health and age; the health of children and the elderly tends to be poorer than the health of young and middle-aged adults. Again, you’d have a curved plot on a graph. You may be asking: How do I tell if a variable is linear or curvilinear? The short answer is that you’d actually have to plot it and look for a visible pattern. But don’t worry about that, for our purposes you simply need to know that Pearson r is only appropriate for linear relationships.

5. Interpreting examples
Interpreting a Pearson coefficient is simple as pie. A relationship between variables can be: a) weak, b) moderate, or c) strong. You’ll have to double-check this in the book (I don’t have mine handy) but the guideline is something like 0-.3 is weak, .31-.69 is moderate, and .7-1 is strong (same for negative numbers). So if age and years of education have a Pearson coefficient of .4, you’d conclude that a moderate positive relationship exists (the older you are, the more education you have). If amount of smoking and life expectancy in years have a Pearson coefficient of -.8, you’d conclude a strong negative relationship exists (the more you smoke, the fewer years you live). If income and IQ have a Pearson coefficient of “.1”, you’d say a weak relationship exists (so weak, in fact, you’d probably conclude that no relationship exists).

That’s it. Those are the five key points to know about Pearson r. To close, here’s a quiz to test your understanding. Bring questions on Monday.

  1. Interpret a Pearson’s coefficient of .75 for the variables age and number of children.

  1. How would you draw a Pearson’s coefficient of “0” on a scatter plot?

  1. True or False: A coefficient of .5 is a stronger indicator that your hypothesis is correct than a -.5 coefficient.

  1. True or False: A coefficient of -.75 for age and religion indicates a strong negative relationship between age and religion?

  1. True or False: For the ratio level variables age and income, a coefficient of .8 means there is a strong causal relationship between age and income.