Subject: Re: Top 5 Notions
Date: Sun, 25 Aug 1996 21:42:44 -0400
From: "Donald Macnaughton" <donmac@matstat.com>
(formerly donmac@hookup.net)
To: Multiple recipients of list <edstat-l@jse.stat.ncsu.edu>
David Moore, 1997 president-elect of the American Statistical
Association, has played a lead role in statistical education.
His contribution to this debate is much appreciated.
On August 16, Dennis Roberts asked readers of this discussion
group to propose the "top 5 notions" they felt should be imparted
to students "at all costs" in the typical first statistics
course. I proposed five notions on August 16 and David proposed
five different notions on August 18.
In the present posting I shall comment briefly on two possible
interpretations of Dennis' original request. I shall then dis-
cuss David's five notions.
TWO INTERPRETATIONS OF DENNIS' ORIGINAL REQUEST
To clarify the goal of this discussion group thread, it is help-
ful to note that one can interpret Dennis Roberts' original re-
quest in two different ways:
Interpretation 1:
What are the four or five notions upon which we should build
the field of statistics for students. That is, what are the
notions that we should introduce early on in the introductory
course in order to lay down a *foundation* for the later mate-
rial.
Interpretation 2:
If we make a list of all the notions that are covered in an
introductory course, what are the four or five most important
notions in the list, without regard to where (early or late)
the notions appear in the course, and without regard to
whether the notions are foundational?
I suggest that we spend some time in this thread addressing the
first interpretation of Dennis' request--that is, discussing the
four or five top *foundational* notions that we should cover in
the introductory course. Such discussion is useful because
clearly the choice and presentation of the foundational notions
plays a major role in determining the success of the introductory
course.
DISCUSSION OF DAVID'S FIVE NOTIONS
There is no doubt that David's five notions are important notions
for the introductory course. Thus David's notions are excellent
candidates for satisfying the *second* interpretation above of
Dennis' request. In fact, I suspect that David framed his five
notions with the second interpretation of Dennis' request in
mind. Therefore, formally, perhaps David's notions should only
be evaluated under the second interpretation.
However, with apologies to David, I shall discuss his notions in
light of the *first* interpretation of Dennis' request. That is,
I shall discuss whether David's notions are foundational notions
that we should discuss early on in the introductory course. I
shall do this because David's notions provide a useful counter-
point for discussion of candidates for the foundational notions.
David begins his discussion by appropriately noting the existence
of unstated qualifications. He then states his first notion as
> 1. Always plot your data. Look for overall patterns and
> striking deviations from them.
This is clearly a very important notion for students to learn.
However, I maintain that it is foundationally much less important
than certain other notions.
I argue in reference 1 that the first goal of the introductory
course should be to give students a lasting appreciation of the
vital role of the field of statistics in empirical research. I
justify this goal by noting that unless students learn to under-
stand and appreciate the overall *role* of statistics, any knowl-
edge they gain of statistical topics will be both of little in-
terest to them, and of little use.
I also suggest in reference 1 that the role of the field of sta-
tistics in empirical research is to help researchers study vari-
ables and relationships between variables as a means to predict-
ing and controlling the values of variables.
I submit that the notion of the role of statistics in empirical
research is foundationally more important than David's notion
about always plotting data and looking for patterns and devia-
tions.
> 2. Faulty data production (e.g., voluntary response or con-
> founding) can make data worthless for the intended purpose.
Again, David makes a very important point. But note his refer-
ence to the "intended purpose". What *is* the intended purpose
of studying data? That is, what is the role of statistics? As
with David's first notion, I believe that we should discuss this
purpose or role for students *first* before we discuss the impor-
tant but less fundamental topic of faulty data production.
That is, I submit that the notion of the role of statistics in
empirical research is foundationally more important than David's
notion about faulty data production.
> 3. Observed association does not imply causation; the strongest
> evidence for causation comes from randomized comparative exper-
> iments.
This notion is dear to my heart because I believe that the scien-
tific experiment (in the strict sense of the term) is a work of
great beauty, and the epitome of scientific (empirical) research.
However, I believe that the most efficient way to cover this no-
tion is through the notion of a relationship between variables--
one of my top five notions. In particular, after introducing the
concept of a relationship between variables to students, the
teacher can distinguish between non-causal and causal relation-
ships between variables. Then the teacher can introduce experi-
ments as the only known way of obtaining reliable information
about causal relationships.
In his statement of his third notion, David's uses the word
"association", which implies that the statement takes the concept
of a relationship between variables as a given. However, al-
though we can take that concept as a given in stating the notion,
we must not take the notion of a relationship between variables
as a given in students' minds. (Some introductory approaches
seem to do so.) Instead, since most students come to the intro-
ductory statistics course lacking a sense of the concept of a re-
lationship between variables, it is very important that we prop-
erly establish this foundational concept in their minds early in
the course.
Thus I submit that the notion of a relationship between variables
is foundationally more important than David's notion about asso-
ciation not implying causation.
> 4. Formal inference is based on asking ``What would happen if
> we did this many times?'' That is, we have confidence in in-
> ference because we use methods that would usually give correct
> answers in repeated use.
This is another important notion. However, given David's refer-
ence to "formal inference", it is reasonable to ask: Formal in-
ference about what? If we can find a way of teaching students a
general language for what the inference is about, we should in-
troduce that language *first*, before we discuss the fact that
formal inference is based on certain frequency considerations.
I believe that the language that allows us to talk about infer-
ence in statistical situations is the language of variables (=
properties of entities) and relationships between variables.
Most (all?) inferences made using statistics in empirical re-
search are inferences about the values of variables or inferences
about the existence or nature of relationships between variables.
I invite readers to propose counterexamples that appear to refute
the preceding sentence. (Some inferences are inferences about
parameters, but parameters can be viewed as being just a rather
special class of variables.)
Thus I submit that the notions of variables and relationships be-
tween variables are foundationally more important than David's
notion about the frequency basis of formal inference.
> 5. Routine formal inference methods are somewhat fragile:
> worry about data production (e.g. nonresponse) and data
> analysis (e.g. outliers), and visit a statistician if you
> see abnormalities.
This is another very important notion. But again, inferences are
usually about relationships between variables. Therefore, I
submit that students should have a clear understanding of the
concept of a relationship between variables before we teach them
about the fragility of formal inference methods.
In summary, for each of David's top five notions I have suggested
that another notion exists that is more fundamental.
MY TOP FIVE NOTIONS
For completeness, I restate my top five notions here in the order
in which I believe they should be taught:
1. entities
2. properties of entities (which are roughly equivalent to vari-
ables)
3. a fundamental goal of empirical research: to develop methods
for predicting and controlling the values of variables
4. relationships *between* variables as a means to prediction and
control
5. statistical methods for studying variables and relationships
between variables as a means to accurate prediction and con-
trol.
I believe that if these notions are carefully illustrated with
practical examples, they give students a lasting appreciation of
the vital role of the field of statistics in empirical research.
LINK
The above points are part of a broader discussion of an
approach to the introductory statistics course available at
http://www.matstat.com/teach/
--------------------------------------------------------
Donald B. Macnaughton MatStat Research Consulting Inc.
donmac@matstat.com Toronto, Canada
--------------------------------------------------------
REFERENCE
1. Macnaughton, D. B. (1996), "The Introductory Statistics
Course: A New Approach." This 8000-word draft paper is
available at http://www.matstat.com/teach/