Subject: Re: Top 5 Notions Date: Sun, 25 Aug 1996 21:42:44 -0400 From: "Donald Macnaughton" <donmac@matstat.com> (formerly donmac@hookup.net) To: Multiple recipients of list <edstat-l@jse.stat.ncsu.edu>

David Moore, 1997 president-elect of the American Statistical Association, has played a lead role in statistical education. His contribution to this debate is much appreciated. On August 16, Dennis Roberts asked readers of this discussion group to propose the "top 5 notions" they felt should be imparted to students "at all costs" in the typical first statistics course. I proposed five notions on August 16 and David proposed five different notions on August 18. In the present posting I shall comment briefly on two possible interpretations of Dennis' original request. I shall then dis- cuss David's five notions. TWO INTERPRETATIONS OF DENNIS' ORIGINAL REQUEST To clarify the goal of this discussion group thread, it is help- ful to note that one can interpret Dennis Roberts' original re- quest in two different ways: Interpretation 1: What are the four or five notions upon which we should build the field of statistics for students. That is, what are the notions that we should introduce early on in the introductory course in order to lay down a *foundation* for the later mate- rial. Interpretation 2: If we make a list of all the notions that are covered in an introductory course, what are the four or five most important notions in the list, without regard to where (early or late) the notions appear in the course, and without regard to whether the notions are foundational? I suggest that we spend some time in this thread addressing the first interpretation of Dennis' request--that is, discussing the four or five top *foundational* notions that we should cover in the introductory course. Such discussion is useful because clearly the choice and presentation of the foundational notions plays a major role in determining the success of the introductory course. DISCUSSION OF DAVID'S FIVE NOTIONS There is no doubt that David's five notions are important notions for the introductory course. Thus David's notions are excellent candidates for satisfying the *second* interpretation above of Dennis' request. In fact, I suspect that David framed his five notions with the second interpretation of Dennis' request in mind. Therefore, formally, perhaps David's notions should only be evaluated under the second interpretation. However, with apologies to David, I shall discuss his notions in light of the *first* interpretation of Dennis' request. That is, I shall discuss whether David's notions are foundational notions that we should discuss early on in the introductory course. I shall do this because David's notions provide a useful counter- point for discussion of candidates for the foundational notions. David begins his discussion by appropriately noting the existence of unstated qualifications. He then states his first notion as > 1. Always plot your data. Look for overall patterns and > striking deviations from them. This is clearly a very important notion for students to learn. However, I maintain that it is foundationally much less important than certain other notions. I argue in reference 1 that the first goal of the introductory course should be to give students a lasting appreciation of the vital role of the field of statistics in empirical research. I justify this goal by noting that unless students learn to under- stand and appreciate the overall *role* of statistics, any knowl- edge they gain of statistical topics will be both of little in- terest to them, and of little use. I also suggest in reference 1 that the role of the field of sta- tistics in empirical research is to help researchers study vari- ables and relationships between variables as a means to predict- ing and controlling the values of variables. I submit that the notion of the role of statistics in empirical research is foundationally more important than David's notion about always plotting data and looking for patterns and devia- tions. > 2. Faulty data production (e.g., voluntary response or con- > founding) can make data worthless for the intended purpose. Again, David makes a very important point. But note his refer- ence to the "intended purpose". What *is* the intended purpose of studying data? That is, what is the role of statistics? As with David's first notion, I believe that we should discuss this purpose or role for students *first* before we discuss the impor- tant but less fundamental topic of faulty data production. That is, I submit that the notion of the role of statistics in empirical research is foundationally more important than David's notion about faulty data production. > 3. Observed association does not imply causation; the strongest > evidence for causation comes from randomized comparative exper- > iments. This notion is dear to my heart because I believe that the scien- tific experiment (in the strict sense of the term) is a work of great beauty, and the epitome of scientific (empirical) research. However, I believe that the most efficient way to cover this no- tion is through the notion of a relationship between variables-- one of my top five notions. In particular, after introducing the concept of a relationship between variables to students, the teacher can distinguish between non-causal and causal relation- ships between variables. Then the teacher can introduce experi- ments as the only known way of obtaining reliable information about causal relationships. In his statement of his third notion, David's uses the word "association", which implies that the statement takes the concept of a relationship between variables as a given. However, al- though we can take that concept as a given in stating the notion, we must not take the notion of a relationship between variables as a given in students' minds. (Some introductory approaches seem to do so.) Instead, since most students come to the intro- ductory statistics course lacking a sense of the concept of a re- lationship between variables, it is very important that we prop- erly establish this foundational concept in their minds early in the course. Thus I submit that the notion of a relationship between variables is foundationally more important than David's notion about asso- ciation not implying causation. > 4. Formal inference is based on asking ``What would happen if > we did this many times?'' That is, we have confidence in in- > ference because we use methods that would usually give correct > answers in repeated use. This is another important notion. However, given David's refer- ence to "formal inference", it is reasonable to ask: Formal in- ference about what? If we can find a way of teaching students a general language for what the inference is about, we should in- troduce that language *first*, before we discuss the fact that formal inference is based on certain frequency considerations. I believe that the language that allows us to talk about infer- ence in statistical situations is the language of variables (= properties of entities) and relationships between variables. Most (all?) inferences made using statistics in empirical re- search are inferences about the values of variables or inferences about the existence or nature of relationships between variables. I invite readers to propose counterexamples that appear to refute the preceding sentence. (Some inferences are inferences about parameters, but parameters can be viewed as being just a rather special class of variables.) Thus I submit that the notions of variables and relationships be- tween variables are foundationally more important than David's notion about the frequency basis of formal inference. > 5. Routine formal inference methods are somewhat fragile: > worry about data production (e.g. nonresponse) and data > analysis (e.g. outliers), and visit a statistician if you > see abnormalities. This is another very important notion. But again, inferences are usually about relationships between variables. Therefore, I submit that students should have a clear understanding of the concept of a relationship between variables before we teach them about the fragility of formal inference methods. In summary, for each of David's top five notions I have suggested that another notion exists that is more fundamental. MY TOP FIVE NOTIONS For completeness, I restate my top five notions here in the order in which I believe they should be taught: 1. entities 2. properties of entities (which are roughly equivalent to vari- ables) 3. a fundamental goal of empirical research: to develop methods for predicting and controlling the values of variables 4. relationships *between* variables as a means to prediction and control 5. statistical methods for studying variables and relationships between variables as a means to accurate prediction and con- trol. I believe that if these notions are carefully illustrated with practical examples, they give students a lasting appreciation of the vital role of the field of statistics in empirical research. LINK The above points are part of a broader discussion of an approach to the introductory statistics course available at http://www.matstat.com/teach/ -------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc. donmac@matstat.com Toronto, Canada -------------------------------------------------------- REFERENCE 1. Macnaughton, D. B. (1996), "The Introductory Statistics Course: A New Approach." This 8000-word draft paper is available at http://www.matstat.com/teach/