```Subject: Re: Eight Features of an Ideal Intro Stat Course
(Response to comments by Karl L. Wuensch)

To: EdStat-L and sci.stat.edu

From: Donald B. Macnaughton <donmac@matstat.com>

Date: Sunday May 9, 1999

Cc: Karl L. Wuensch <PSWUENSC@ECUVM.CIS.ECU.EDU>
```

```Referring to a November 25 post of mine, Karl Wuensch writes (on
November 26)

> Donald Macnaughton ... suggests:
>
>> ... I recommend that statistics teachers omit discussing uni-
>> variate distributions near the beginning of the introductory
>> course.  I recommend that teachers instead concentrate on dis-
>> cussing relationships between variables.
>
>    Don goes on to explain why (univariate is boring, bivariate
> not) and challenges us to describe univariate exercises that
> are not boring.  I do cover univariate distributions before I
> get on to bivariate and multivariate distributions, and do warn
> my students that these can be boring compared to what is to
> follow.

To illustrate why univariate distributions are boring compared to
relationships between variables, suppose that as statisticians we
are interested in some real-world variable, which I shall call Y.
We can empirically study Y in two different ways:

1. We can study the *relationship* between Y (as the response
variable) and other appropriate variables (as predictor vari-
ables).

2. We can study the univariate distribution of Y in isolation.

The second way of studying Y is simply a degenerate case of the
first.  That is, studying the univariate distribution of ANY
variable Y is rigorously equivalent (mathematically and empiri-
cally) to studying the relationship between Y and a set of pre-
dictor variables in the limiting case when the number of predic-
tor variables is reduced to zero.

Since the first way of studying Y subsumes the second, and since
the first way generally gives us much better ability to predict
and control the values of Y, the first (relationship-between-
variables) way of studying Y is more interesting than the second.
That is, studying univariate distributions is boring compared to
studying relationships between variables.

(I list the topics I include under the general topic of univari-
ate distributions in appendix A.  Note that my recommendation to
omit discussing univariate distributions near the beginning of
the introductory course applies only for students who are NOT ma-
joring in statistics -- students majoring in statistics do need
to understand univariate distributions early in their careers.)

> I try to choose variables which my students will find interest-
> ing because they want to compare their own score on that vari-
> able with those of other persons.  For example, I collect from
> my students the first week of class a crude measure of how
> frightened of statistics they are.  They always seem to be in-
> terested in where their score falls in that distribution.

Karl gives an example of an empirical univariate distribution
that is clearly interesting to students.  It is interesting be-
cause most students wish to know where they (as individuals) fall
in various distributions.  That is, they wish to know how "nor-
mal" or how "average" they are.

But although the study of the univariate distribution of "fear of
statistics" is clearly interesting, there are two reasons why I
believe this study fails to provide an effective example of the
practical use of statistics:

First, most statisticians view the field of statistics as a set
of techniques to help us generalize from findings (of patterns)
in a sample to correct statements about a population.  That is,
in almost every use of statistics in empirical research the re-
searcher is not merely interested in the entities in the sample.
Instead, he or she hopes to be able to make useful generaliza-
tions from the information gathered in the sample to the other
entities in the population.

However, Karl is focusing on where each individual student falls
in the univariate distribution of the fear scores.  Thus the ex-
ample is NOT generalizing from findings in a sample to statements
from findings in the sample to statements about *individuals in
the sample*, which is backwards.  Thus the example is not a typi-
cal use of statistics in empirical research.

I believe that Karl's students are interested in the univariate
distribution in his example because it teaches them about *them-
selves*.  But (with apologies to Karl, who clearly has good in-
tentions) the example (because it is backwards) does not intro-
duce students to any important ideas about the (standard) practi-
cal use of statistics.

*   *   *

A second reason why study of the univariate distribution of "fear
of statistics" is not an effective example of the practical use
of statistics is that studying the univariate distribution of
some variable in isolation usually has no obvious significant so-
cial payoff.

By "social payoff", I mean that an effective example will provide
some clear *basis for action* on the part of some person or group
(Scheaffer 1992, p. 69).  I recommend that each example in an in-
troductory statistics course have an obvious social payoff be-
cause if we consistently demonstrate obvious payoffs in our exam-
ples, we are much more likely to impress students with the value
of statistics.

On the other hand, if our examples consistently LACK obvious pay-
offs, intelligent students will conclude that our field special-
izes in dealing with frivolous problems.

For the study of the univariate distribution of "fear of statis-
tics"

What is the payoff (basis for action) for some (any) per-
son or group of the knowledge we obtain in this example?

I suggest that it is hard to see a clear payoff for any person or
group (beyond the students in the sample) from studying the uni-
variate distribution of "fear of statistics".  Admittedly, study
of the distribution does increase our general knowledge.  How-
ever, I prefer not to call this vague (although sometimes useful)
benefit a direct "payoff" because it provides no obvious basis
for action.

My experience suggests that examples that focus on univariate
distributions rarely demonstrate an obvious payoff.  Can you
think of an example of a study of a univariate distribution that
clearly shows an obvious direct payoff?

(I discuss a situation in which univariate distributions do pro-
vide a payoff in appendix B.  I discuss elsewhere some putative
examples of interesting univariate distributions [1998a, app. G;
1998b].)

On the other hand, many examples of relationships between vari-
ables clearly demonstrate significant obvious direct payoffs.
Such examples can be readily found in all fields of empirical re-
search across science, technology, business, industry, and gov-
ernment.  For example, all proper tests of new medical treatments
can be easily viewed as studies of relationships between vari-
ables.  All such studies, when they are successful, have clear
social payoffs in that they provide a basis for action to improve
human health.

>      But I also introduce, at the time we are studying univari-
> ate distributions, the notion of looking at the association be-
> tween variables.

By the "association" between variables Karl is referring to what
I call a "relationship" between variables.

> For example, we segregate the "fear of stats" scores of women
> from those of men and then compare those two univariate distri-
> butions.  Of course, we are really considering the relationship
> between a dichotomous variable (sex/gender) and a continuous
> one (admitted fear of stats),

Karl turns his fear-of-statistics example into an example of a
more typical empirical research project by introducing the notion
of a relationship between variables.  We can show students how if
we find a substantial relationship (in some meaningful popula-
tion) between "gender" and "fear of statistics", we have a clear
payoff or basis for action.  The payoff occurs in the sense that
we (as society) can take steps either to remove the cause of the
relationship (the cause may be sexism) or to treat the two groups
differently in order to reduce the fear in the more fearful
group.

> but [we] have not formally talked about point biserial correla-
> tions or independent samples t-tests and the like yet.

Karl highlights an important fact:  It is not necessary to bring
formal statistical procedures into the discussion to discuss re-
lationships between variables.  I recommend that teachers capi-
talize on this fact and give students a strong sense of the con-
cept of a relationship between variables before introducing ANY
formal statistical procedures.  If we show students a broad set
of practical examples of relationships, and if these examples are
not encumbered by the complicated procedures of statistics, the
students come to recognize that empirical study of relationships
between variables is the best objective method for accurate pre-
diction and control.

AFTER students properly understand and appreciate the usefulness
of relationships between variables as a means to prediction and
control, we can bring the field of statistics out onto the stage.
We can characterize the field as a set of optimal techniques for
studying variables and relationships between variables as a means
to accurate prediction and control.  When presented from this
unifying point of view, the complicated procedures of statistics
fall more easily into place.

I further describe the approach I discuss above in two papers
(1996, 1999).

> When we go to the lab for our first computing (Minitab) exer-
> cise, the data they have (from the Minitab handbook) is on
> pulse rates before and after exercise.  They compute change
> scores, and do some univariate descriptive statistics on the
> distribution of change scores.  Again here, we are dealing with
> a univariate distribution, but in a way that does address the
> relationship between two variables (exercise and pulse rate).

We can validly study the relationship between the variables
"amount of exercise" and "pulse rate" in the example in terms of
the univariate distribution of the change scores.  However, I
suggest that this approach has four disadvantages:

1. The change-score point of view cannot easily be extended to
more complicated situations.

2. The change-score point of view may lead us to lose sight of
the original variables in the research project.

3. The change-score point of view may lead us to lose sight of
the fact that the example is the simplest case of the impor-
tant statistical procedure of repeated measurements (repeated
measures).

4. The change-score point of view requires that students expend
an extra intellectual effort.

Appendix C discusses the four disadvantages in more detail and
notes that the relationship-between-variables point of view of
the pulse-rate example does not have any of the above disadvan-
tages.

of view does not appear to have any significant advantages over
the relationship-between-variables point of view.

>      Might it be possible to have our cake and eat it too, that
> is, to follow the logical order of univariate - bivariate -
> multivariate, AND to start our focus on relationships between/
> among variables at the beginning of the course?

Karl proposes a compromise approach in which relationships are
discussed together with univariate distributions near the begin-
ning of the introductory course.  This approach is clearly possi-
ble.  Furthermore, IF the compromise approach is the best way to
help students understand statistics, we should certainly follow
it.

But is the compromise approach the best way?  We can address this
question by considering another simpler question:

distributions *at all* at the beginning of the introduc-
tory course.

The discussion above of the fear-of-statistics and pulse-rate ex-
amples suggests that, for each example

- viewing the example in terms of a univariate distribution ap-
pears to have NO significant advantages over viewing the exam-
ple in terms of a relationship between variables

- viewing the example in terms of a univariate distribution has
significant DISadvantages over viewing the example in terms of
a relationship between variables.

Unless someone can propose significant advantages of discussing
univariate distributions near the beginning of the introductory
course, I suggest that we should not discuss them there.  We
should not discuss them because they are boring (because they
have no obvious practical uses).

*   *   *

If (as I suggest) discussion of univariate distributions at the
beginning of the introductory statistics course provides no sig-
nificant advantages, why them do some teachers discuss them?  I
see two main reasons:

One reason for discussing univariate distributions at the begin-
ning is to support an extinct need that has become a tradition.
I reason as follows:

In the past, before the arrival of good statistical computing
packages, a person performing a statistical analysis had to un-
derstand the mathematics of statistics in order to carry out the
(necessarily manual) computations.  (It is almost impossible to
perform statistical computations manually if one does not prop-
erly understand them.)  The mathematics of statistics is largely
based on the mathematics of univariate distributions.  Thus in
the past careful study of univariate distributions was clearly
necessary.

Nowadays, easy-to-use computer programs are available that can
perform all the standard statistical analyses.  Thus users of
these analyses need no longer manually perform statistical compu-
tations.  It follows that users (including students) need no
longer understand the mathematics of univariate distributions to
support their (now-computerized) computations.

It is hard to see important uses of univariate distributions be-
yond the background support they give in the statistical computa-
tions underlying the study relationships between variables.  In
particular, only rarely does an experienced empirical researcher
study the univariate distribution of some interesting empirical
variable in isolation.  Instead, researchers invariably study
their variables of interest together with some predictor vari-
ables -- that is, they study relationships between variables.

A second reason why some teachers begin with univariate distribu-
tions is that univariate distributions are (because they are a
degenerate case) clearly simpler than relationships between vari-
ables.  The greater simplicity leads some teachers to believe
that they should "start simple" and first cover univariate dis-
tributions before they cover relationships.

(Some teachers may further believe that students MUST first study
univariate distributions because [they believe] students cannot
understand the concept of a relationship between variables until
they have mastered the concept of a univariate distribution.
However, this belief is incorrect because students learn to un-
derstand various relationships between variables in high-school
science and mathematics classes, with no appeal to the concept of
a univariate distribution, as I discuss in an earlier post
[1998c].)

Teachers who choose to start simple with univariate distributions
are logically correct -- univariate distributions are simpler.
However, this approach has a serious psychological problem --
study of univariate distributions is boring because such study
usually has little or no obvious payoff.  Thus many beginning
students are alienated by univariate distributions.  This leads

If univariate distributions are boring, provide little
obvious payoff, and are not necessary, what is to stop us
from completely bypassing discussion of univariate dis-
tributions at or near the beginning of the introductory
statistics course?

If we bypass univariate distributions, and if we focus on rela-
tionships between variables as a means to accurate prediction and
control, and if we emphasize the significant social payoffs that
come from knowledge of such relationships, we can (because the
material is more interesting) expect substantially greater suc-
cess at giving students a lasting appreciation of the vital role
of our field.

-------------------------------------------------------
Donald B. Macnaughton   MatStat Research Consulting Inc
-------------------------------------------------------

APPENDIX A: SUBTOPICS OF UNIVARIATE DISTRIBUTIONS

I include the following topics under the general topic of uni-
variate distributions:

- measures of the central tendency of univariate distributions
(e.g., mean, median)

- measures of the spread or variability of univariate distribu-
tions (e.g., standard deviation, mean absolute deviation)

- other measures used to characterize univariate distributions
(e.g., other moments)

- graphical representations of univariate distributions (e.g.
[when only reflecting a single variable], dot plots, box plots,
bar charts, histograms, stem and leaf plots, density plots)

- mathematical representations of univariate distributions (e.g.,
density functions, moment generating functions).

I believe that each of the above topics is important and belongs
at a certain point in all students' (extended) statistical ca-
reers.  However, I recommend against teaching ANY of these topics
at or near the beginning of the introductory course (for students
not majoring in statistics).  Instead, as I note above, I recom-
mend that teachers first carefully cover the more important con-
cept of a relationship between variables.  After students have a
good sense of the concept of a relationship between variables,
discussion of univariate distributions can be introduced at ap-
propriate points (mainly in support of the study of relationships
between variables).

I discuss the placement of discussion of univariate distributions
in statistics courses and I propose a syllabus for the introduc-
tory course in a paper (1999, sec. 6.4 and 6.9).

(Although I recommend against covering the graphical representa-
tion of univariate distributions, I strongly recommend that
teachers cover the graphical representation of relationships be-
tween variables early in the introductory statistics course.)

APPENDIX B: A SITUATION IN WHICH UNIVARIATE DISTRIBUTIONS DO
PROVIDE A PAYOFF

Earlier in this post I say that studying the univariate distribu-
tion of some variable in isolation *usually* has no obvious sig-
nificant social payoff.  However, univariate distributions obvi-
ously DO provide a social payoff (i.e., a basis for action) in
one important situation -- the situation in which an action deci-
sion is made on the basis of the univariate distribution of some
variable.

For example, suppose we carefully survey beginning students' fear
of statistics across a reasonable sample of institutions and stu-
dents.  And suppose our study reveals that a large proportion of
students have high fear.  This finding might motivate various in-
terested groups to increase resources directed at reducing stu-
dents' fear.  That is, the finding provides a clear basis for ac-
tion.

But I suggest that this use of a univariate distribution as a ba-
sis for action is much less important than the use of relation-
ships between variables as a basis for action.  This is because
in almost every situation in which we MIGHT make an action deci-
sion on the basis of a univariate distribution of some variable
(say the variable Y), we can make a BETTER action decision on the
basis of a closely related relationship between variables.  The
response variable in this relationship is Y, and the predictor
variables can be ANY other variables that we have reason to be-
lieve are related to Y.  If we empirically discover that one or
more of the predictor variables are related to Y, we will then be
able to use our new knowledge of the relationship to predict the
values of Y more accurately than we could possibly predict by an
equivalent study of the univariate distribution of Y in isola-
tion.  Assuming we have chosen the predictor variable(s) wisely,
this improved prediction ability will give us a better basis for
action.

For example, in studying students' fear of statistics we COULD
study the univariate distribution of "fear of statistics" in stu-
dents as a possible basis for action to reduce students' fear.
But we can obtain a better understanding of students' fear of
statistics by studying the *relationship* between "fear of sta-
tistics" as the response variable and other variables such as
"gender", "socioeconomic status", "prior training", and so on, as
predictor variables.  This better understanding will enable us to
make better action decisions for decreasing students' fear and
for improving the teaching of statistics.

Thus knowledgeable empirical researchers generally concentrate
NOT on univariate distributions, but on relationships between
variables.  Thus the study of relationships between variables is
much more important than the study of univariate distributions.

APPENDIX C: DISADVANTAGES OF THE CHANGE-SCORE POINT OF VIEW

In the earlier discussion of Karl's pulse-rate example I identify
four disadvantages that the "change-score" point of view of the
example has relative to the "relationship-between-variables"
point of view.  Here are more detailed explanations of the four

1. If we adopt a change-score point of view, we run into diffi-
culties if we try to extend the point of view to more compli-
cated situations.  We can see this by noting that ALL situa-
tions that can be viewed as studying the univariate distribu-
tion of a set of change scores can also be easily viewed as
studying a relationship between two variables.  (Karl's pulse-
rate example is, as Karl notes, an instance of this fact.)  On
the other hand, many situations that can easily be viewed as
studying a relationship between variables CANNOT easily be
viewed as studying change scores.  (For example, while it is
easy to view a multi-way analysis of variance or a multi-way
regression in terms of the study of a relationship between
variables, it is hard to view either of these statistical pro-
cedures in terms of change scores.)  Thus although the use of
change scores in the pulse-rate example is *valid*, it seems
wise to avoid this use.  Instead, it seems more reasonable to
view the example in terms of the broader concept of the rela-
tionship between "amount of exercise" (as the predictor vari-
able) and "pulse rate" (as the response variable).

2. A second disadvantage of the change-score approach is that if
we focus on the derived variable "change score", we may lose
sight of the original three (main) variables in the example,
which are "amount of exercise" (called "RAN" in the Minitab
data), "pulse rate at time 1", and "pulse rate at time 2".  In
particular, in REAL empirical research (as opposed to in dis-
cussion at the beginning of the introductory statistics
course) it is useful to examine the univariate distributions
of the pulse rates at the two exercise levels because we may
find important features in these distributions.  But if we fo-
cus on the derived variable "change score", these univariate
distributions are hidden.

3. A third disadvantage of the change-score approach is that if
we focus on the derived variable "change score", we may lose
sight of the fact that the example represents the simplest
case of a set of powerful techniques for studying relation-
ships between variables called "repeated measurements" (or
"repeated measures").  Under the procedure of repeated meas-
urements we measure the values of the response variable and
predictor variable(s) more than once in each of the entities
in the research project -- typically under different condi-
tions each time we measure.  The procedure of repeated meas-
urements is frequently used to increase statistical power in
experiments in psychology and medicine when the cost of ac-
quiring experimental entities (usually people) is high and
when, in addition, it is possible for the entities to partici-
pate in more than one treatment condition without compromising
the integrity of the experiment (Winer 1971, ch. 4 and 7; SAS
Institute 1990, pp. 951 - 958).  It makes sense to introduce
the pulse-rate example in terms that we can easily extend when
it comes time to discuss other examples of repeated measure-
ments.  I suggest that the concepts of repeated measurements
are best explained in terms of relationships between vari-
ables, and it is difficult to explain any but the simplest use
of repeated measurements in terms of change scores.

4. Finally, (in a point that is related to the first point above)
if we teach the pulse-rate example in terms of the change-
score point of view, we force students to learn a separate
point of view.  Learning the change-score point of view re-
quires an extra intellectual effort from beginning students in
a situation that many teachers agree is already difficult for
them.  We can save students this extra effort by presenting
the example in terms of the more general relationship-between-
variables point of view.

REFERENCES

Macnaughton, D. B. 1996. "The entity-property-relationship ap-
proach to statistics:  An introduction for students." Avail-
able at http://www.matstat.com/teach/

Macnaughton, D. B. 1998a. "Eight features of an ideal introduc-
tory statistics course."  Available at
http://www.matstat.com/teach/

Macnaughton, D. B. 1998b. "Re: Eight features of an ideal intro
stat course (response to comments by Gary Smith)."  Posted to
sci.stat.edu and EdStat-L on November 23, 1998.  Available at
http://www.matstat.com/teach/p0036.htm

Macnaughton, D. B. 1998c. "Re: Eight features of an ideal intro
stat course (response to comments by Dennis Roberts)."  Posted
to sci.stat.edu and EdStat-L on July 23, 1998.  Available at
http://www.matstat.com/teach/p0033.htm

Macnaughton, D. B. 1999. "The introductory statistics course:
The entity-property-relationship approach." Available at
http://www.matstat.com/teach/

SAS Institute Inc. 1990. _SAS/STAT user's guide, version 6, vol-
ume 2_ 4th ed. Cary, NC: author.

Scheaffer, R. L. 1992. "Data, Discernment and Decisions: An Em-
pirical Approach to Introductory Statistics," in _Statistics
for the Twenty-First Century, MAA Notes No. 26,_ ed. by F.
Gordon and S. Gordon, Washington, DC:  Mathematical Associa-
tion of America, pp. 69-82.

Winer, B. J. 1971. _Statistical principles in experimental de-
sign._ 2d ed. New York: McGraw-Hill.

```