Anscombe Data Set for Linear Regression

The importance of visually examining a regression line relative to the data on which the regression is based is a point that cannot be overemphasized. The Anscombe data set, created by the statistician F. J. Anscombe (American Statistician, February 1973, 17-21), provides four sets of data each of which yields the same regression results:

y = 3.00 + 0.500x
correlation coefficient of 0.816

A visual examination of the data sets and their respective regression lines shows that a linear model is inappropriate for all but one of the data sets. The Excel file in this post contains the four data sets, each on a separate tab.

File: Anscombe.xls

This entry was posted in Statistics and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *