Anscombe (1973) presented four data sets in a paper which advocated the more extensive use of statistical graphics in an era when graphical tools were not generally available without a degree of programming effort. The data were values of an explanatory variable and a response variable, and for each data set fitting a simple linear regression model led (to a close approximation) to the same results. Scatter plots of the data, however, were very different in appearance.

Some years ago I produced four data sets, based on the Anscombe data but with the property that fitting a simple linear regression model gave exactly the same results: these data sets are referred to as Sets A, B, C and E below. I also added a fifth data set, Set D. See Bassett et al. (1986) for descriptions of these data sets. (The idea was that, if members of a class were given different data sets selected from Sets A-D to work on, they would probably not notice that they had different data sets if they just carried out the standard regression calculations.)

Anscombe's data

[Sets 1-4]

My data

[Sets A-D] [Set A] [Set B] [Set C] [Set D] [Set E] [Plots (PostScript)]


Anscombe, F. J. (1973) Graphs in statistical analysis. The American Statistician, 27, 17-21.

Bassett, E. E., Bremner, J. M., Jolliffe, I.T., Jones, B., Morgan, B. J. T., and North, P. M. (1986) Statistics: Problems and Solutions. London: Edward Arnold. (Second edition: World Scientific, 2000.)

Some plots

PostScript version
GIF version (from

27 January 2001