The Zone of Rejection: Negative Thoughts on Statistical Inference

Pinterest LinkedIn Tumblr +

The controversy over tests of significance is quite viable-it lies dormant for periods of time, but emerges now and then in a vigorous way. Pertaining to the controversy, the April 1971 issue of this journal contains three critiques (Acock, Libby, and Williams; Henkel and Morrison; and Pierce) of my article on the nonutility of significance tests (Labovitz, 1970). Answering these critiques provides an opportunity to further expose the problems associated with statistical inference.

Very few sociologists in print seem to support the usage of the tests as a ubiquitous and helpful technique in their own right (see especially Gold, 1969; Winch and Campbell, 1969; Kish, 1959; Camilleri, 1962). The position of these authors is that tests, used in conjunction with descriptive measures, serve as a limited but useful aid in handling random error,providing the necessary condition for substantive significance, or suggesting what needs to be explained in the data.


Although I do not agree with these critics, this should not be construed as a rejection of scientific inference. Inferring from a limited amount of data to general laws is a necessary part of the scientific process. I take exception, however, to the view that statistical inference is an important aspect of scientific inference. Replication and theory serve as useful guides to scientific inference and scientific laws. Statistical inference leads to trivial results and simply is not an aid in achieving scientific goals.

Science must involve some type of inference because it is impossible to study all things at all times. Consequently, we study parts or “samples.” On the basis of theoretical reasoning and the descriptive aspects of these parts, a “jump” is made to a broader base. We may observe for several years that apples fall rather than rise, and finally we may decide (using some astute theoretical reasoning) that we will generalize these limited findings and conclude that apples fall (under certain conditions), because of some force called gravity.

No mention is made of taking a random sample of apples in specific orchards or of randomizing several apples into experimental and control groups, and observing whether they rise or fall. To establish a scientific law, we do not want to know whether all the apples fall in a particular field, but whether apples (and other objects) fall under certain theoretically designated conditions. Replication of the study in different settings around the world provides a scientific basis from which we can “jump” or infer to the general law. Statistical inference from a random sample (which is nearly impossible to meet) to a small population is useless for this purpose.

Simply stated, the target population is not the sampled population. Someone at this point might argue that the apples selected in any particular orchard may not represent its population of apples. It is possible that we are using a biased subset of apples for inferring to a limited population; and somehow we must account for non representativeness or at least establish its probable existence. The only way to handle this problem, a critic may argue, is to study random samples that are characterized by theoretically stipulated chance error.


Basically, Acock, Libby, and Williams (1971: 163, 166, 167, and 169) have seven criticisms of my rejection of significance tests: my position contains little new information; I go beyond just rejecting significance tests to rejecting all:statistical inferences; significance tests have been useful to the development of other sciences; sample size, degrees of freedom, and power efficiency do not obviate the use of the tests; all inference (statistical and scientific) must be based on random error; not knowing the extent of type II error is not too serious; and test misuse only indicates that sociologists need to have better training.

Their basic position is that statistical inference is useful when used in conjunction with descriptive measures by providing supporting evidence. I do not believe it necessary to confront all their criticisms in detail, but I would like to give a few illustrations of their often misguided and confused critique. For example, it is irrelevant whether my reasoning contains new or old information. It is important, however, whether my reasoning is correct. Acock and his associates may have performed a service to the field if they had limited their counterarguments to my reasoning and disregarded their evaluation of my novelty. With regard to their point that I go beyond significance tests to the rejection of all statistical inference (presumably confidence intervals as well as tests), I must agree.

All statistical inference is based on an underlying logic that yields trivial results (among other criticisms). Confidence intervals may “test” an infinite number of hypotheses, but they are still trivial. The width of an interval is largely dependent on sample size and the level of confidence selected; and the results, at best,apply to only the sampled population. To clear up the record, they accuse me of suporting Kish’s (1959) statement that confidence intervals are more important than tests by citing misuse number 18 in my article.

But I am only citing a point made by Kish (and others); I am not supporting the point. Their argument that other sciences have benefited from the tests is supported by rather peculiar reasoning. They state the following: analysis of variance “is at the heart of many scientific fields”; others use significance tests; if Labovitz is right, “the next generation of scientists is being incredibly misled”; and statistical texts present statistical inference.

These arguments do not establish the utility of the tests. They are merely statements that most people use them or advocate their use. Indeed, they all could be misled. May I remind Acock, Libby, and Williams that the majority could be wrong, for example in defending racism on the grounds of intellectual inferiority, defending religion on the supposition of miracles, and believing the earth is flat, because it looks that way. Perhaps their most confused thought concerns their discussion of inferring a correlation coefficient from a sample to a population (Acock et al., 1971: 165). They imply that statistical significance is a way of estimating the reliability of a coefficient and provides a degree of confidence for meaningful findings.

Tests do not provide a measure of reliability (which may be provided by a reliability coefficient in a test-retest situation); and they make the classic erroneous interpretation of confusing substantive significance (meaningful finding) with statistical significance. They also confuse the nature of random sampling with randomization (Acock et al., 1971: 166) when they stipulate that lack of a random sample of electrons has not diverted physicists from using the tests. This example is given in reference to inferring from a sample to a population. Randomization, however, provides an interpretation about causality and is not designed for inferences to a specified population.

Acock, Libby, and Williams conclude their critique by claiming I have not shown the nonutility of statistical inference. I hope somebody else can alter their views. A careful reading of The Significance Test Controversy (Morrison and Henkel, 1970) may make them reflect a little. In contrast to this critique, the few remarks by Henkel and Morrison are somewhat favorable and comparatively mild. They take issue with my comment that they advocate the use of significance tests in a limited situation.

Author: Sanford Labovitz


Leave A Reply

2021 ©