Anscombe (1973) presented four data sets in a paper which advocated the more extensive use of statistical graphics in an era when graphical tools were not generally available without a degree of programming effort. The data were values of an explanatory variable and a response variable, and for each data set fitting a simple linear regression model led (to a close approximation) to the same results. Scatter plots of the data, however, were very different in appearance.
Some years ago I produced four data sets, based on the Anscombe data but with the property that fitting a simple linear regression model gave exactly the same results: these data sets are referred to as Sets A, B, C and E below. I also added a fifth data set, Set D. See Bassett et al. (1986) for descriptions of these data sets. (The idea was that, if members of a class were given different data sets selected from Sets A-D to work on, they would probably not notice that they had different data sets if they just carried out the standard regression calculations.)