Subject: Re: Top 5 Notions

   Date: Sun, 25 Aug 1996 21:42:44 -0400

   From: "Donald Macnaughton" <donmac@matstat.com>
                     (formerly donmac@hookup.net)

     To: Multiple recipients of list <edstat-l@jse.stat.ncsu.edu>

David Moore, 1997 president-elect of the American Statistical 
Association, has played a lead role in statistical education.  
His contribution to this debate is much appreciated.

On August 16, Dennis Roberts asked readers of this discussion 
group to propose the "top 5 notions" they felt should be imparted 
to students "at all costs" in the typical first statistics 
course.  I proposed five notions on August 16 and David proposed 
five different notions on August 18.

In the present posting I shall comment briefly on two possible 
interpretations of Dennis' original request.  I shall then dis-
cuss David's five notions.


TWO INTERPRETATIONS OF DENNIS' ORIGINAL REQUEST
To clarify the goal of this discussion group thread, it is help-
ful to note that one can interpret Dennis Roberts' original re-
quest in two different ways:

Interpretation 1: 
   What are the four or five notions upon which we should build 
   the field of statistics for students.  That is, what are the 
   notions that we should introduce early on in the introductory 
   course in order to lay down a *foundation* for the later mate-
   rial.

Interpretation 2:
   If we make a list of all the notions that are covered in an 
   introductory course, what are the four or five most important 
   notions in the list, without regard to where (early or late) 
   the notions appear in the course, and without regard to 
   whether the notions are foundational?

I suggest that we spend some time in this thread addressing the 
first interpretation of Dennis' request--that is, discussing the 
four or five top *foundational* notions that we should cover in 
the introductory course.  Such discussion is useful because 
clearly the choice and presentation of the foundational notions 
plays a major role in determining the success of the introductory 
course.  


DISCUSSION OF DAVID'S FIVE NOTIONS
There is no doubt that David's five notions are important notions 
for the introductory course.  Thus David's notions are excellent 
candidates for satisfying the *second* interpretation above of 
Dennis' request.  In fact, I suspect that David framed his five 
notions with the second interpretation of Dennis' request in 
mind.  Therefore, formally, perhaps David's notions should only 
be evaluated under the second interpretation.

However, with apologies to David, I shall discuss his notions in 
light of the *first* interpretation of Dennis' request.  That is, 
I shall discuss whether David's notions are foundational notions 
that we should discuss early on in the introductory course.  I 
shall do this because David's notions provide a useful counter-
point for discussion of candidates for the foundational notions.


David begins his discussion by appropriately noting the existence 
of unstated qualifications.  He then states his first notion as

> 1. Always plot your data.  Look for overall patterns and
> striking deviations from them.

This is clearly a very important notion for students to learn.  
However, I maintain that it is foundationally much less important 
than certain other notions.

I argue in reference 1 that the first goal of the introductory 
course should be to give students a lasting appreciation of the 
vital role of the field of statistics in empirical research.  I 
justify this goal by noting that unless students learn to under-
stand and appreciate the overall *role* of statistics, any knowl-
edge they gain of statistical topics will be both of little in-
terest to them, and of little use.

I also suggest in reference 1 that the role of the field of sta-
tistics in empirical research is to help researchers study vari-
ables and relationships between variables as a means to predict-
ing and controlling the values of variables.

I submit that the notion of the role of statistics in empirical 
research is foundationally more important than David's notion 
about always plotting data and looking for patterns and devia-
tions.


> 2. Faulty data production (e.g., voluntary response or con-
> founding) can make data worthless for the intended purpose.

Again, David makes a very important point.  But note his refer-
ence to the "intended purpose".  What *is* the intended purpose 
of studying data?  That is, what is the role of statistics?  As 
with David's first notion, I believe that we should discuss this 
purpose or role for students *first* before we discuss the impor-
tant but less fundamental topic of faulty data production.

That is, I submit that the notion of the role of statistics in 
empirical research is foundationally more important than David's 
notion about faulty data production.


> 3. Observed association does not imply causation; the strongest
> evidence for causation comes from randomized comparative exper-
> iments.

This notion is dear to my heart because I believe that the scien-
tific experiment (in the strict sense of the term) is a work of 
great beauty, and the epitome of scientific (empirical) research.

However, I believe that the most efficient way to cover this no-
tion is through the notion of a relationship between variables--
one of my top five notions.  In particular, after introducing the 
concept of a relationship between variables to students, the 
teacher can distinguish between non-causal and causal relation-
ships between variables.  Then the teacher can introduce experi-
ments as the only known way of obtaining reliable information 
about causal relationships.   

In his statement of his third notion, David's uses the word 
"association", which implies that the statement takes the concept 
of a relationship between variables as a given.  However, al-
though we can take that concept as a given in stating the notion, 
we must not take the notion of a relationship between variables 
as a given in students' minds.  (Some introductory approaches 
seem to do so.)  Instead, since most students come to the intro-
ductory statistics course lacking a sense of the concept of a re-
lationship between variables, it is very important that we prop-
erly establish this foundational concept in their minds early in 
the course.

Thus I submit that the notion of a relationship between variables 
is foundationally more important than David's notion about asso-
ciation not implying causation.


> 4. Formal inference is based on asking ``What would happen if
> we did this many times?''  That is, we have confidence in in-
> ference because we use methods that would usually give correct
> answers in repeated use.

This is another important notion.  However, given David's refer-
ence to "formal inference", it is reasonable to ask:  Formal in-
ference about what?  If we can find a way of teaching students a 
general language for what the inference is about, we should in-
troduce that language *first*, before we discuss the fact that 
formal inference is based on certain frequency considerations.  

I believe that the language that allows us to talk about infer-
ence in statistical situations is the language of variables (= 
properties of entities) and relationships between variables.  
Most (all?) inferences made using statistics in empirical re-
search are inferences about the values of variables or inferences 
about the existence or nature of relationships between variables.

I invite readers to propose counterexamples that appear to refute 
the preceding sentence.  (Some inferences are inferences about 
parameters, but parameters can be viewed as being just a rather 
special class of variables.)

Thus I submit that the notions of variables and relationships be-
tween variables are foundationally more important than David's 
notion about the frequency basis of formal inference.


> 5. Routine formal inference methods are somewhat fragile:
> worry about data production (e.g. nonresponse) and data
> analysis (e.g. outliers), and visit a statistician if you 
> see abnormalities.

This is another very important notion.  But again, inferences are 
usually about relationships between variables.  Therefore, I 
submit that students should have a clear understanding of the 
concept of a relationship between variables before we teach them 
about the fragility of formal inference methods.


In summary, for each of David's top five notions I have suggested 
that another notion exists that is more fundamental.


MY TOP FIVE NOTIONS
For completeness, I restate my top five notions here in the order 
in which I believe they should be taught:
1. entities
2. properties of entities (which are roughly equivalent to vari-
   ables)
3. a fundamental goal of empirical research:  to develop methods 
   for predicting and controlling the values of variables
4. relationships *between* variables as a means to prediction and 
   control
5. statistical methods for studying variables and relationships 
   between variables as a means to accurate prediction and con-
   trol.

I believe that if these notions are carefully illustrated with 
practical examples, they give students a lasting appreciation of 
the vital role of the field of statistics in empirical research.


LINK
The above points are part of a broader discussion of an 
approach to the introductory statistics course available at

            http://www.matstat.com/teach/

--------------------------------------------------------
Donald B. Macnaughton   MatStat Research Consulting Inc.
donmac@matstat.com      Toronto, Canada
--------------------------------------------------------


REFERENCE
1. Macnaughton, D. B. (1996), "The Introductory Statistics 
   Course:  A New Approach."  This 8000-word draft paper is 
   available at http://www.matstat.com/teach/