This paper proposes five concepts for discussion at the beginning of an introductory statistics course: (1) entities, (2) properties of entities (which are roughly equivalent to variables), (3) a major goal of empirical research: to predict and control the values of variables, (4) relationships between variables as a key to prediction and control, and (5) statistical techniques for studying relationships between variables as a means to accurate prediction and control. After students have learned the five concepts they learn standard statistical topics in terms of the concepts. It is recommended that students learn the material through numerous practical examples. It is argued that the approach gives students a lasting appreciation of the vital role of the field of statistics in empirical research. KEY WORDS: Statistical education; Prediction; Control; Role of statistics in empirical research. NOTE 1. A later version of this essay is available at http://www.matstat.com/teach/eprt0130.htm and (in PDF) at http://www.matstat.com/teach/eprt0130.pdf NOTE 2. The (HTML) version of this paper you are presently reading takes around 49 pages to print. A more compact (PDF) version that prints in 23 pages is available at http://www.matstat.com/teach/
Two former presidents of the American Statistical Association have stated that "students frequently view statistics as the worst course taken in college" (Hogg 1991, Iman 1994). A third former president has stated that the field of statistics is in a "crisis" and the subject has become "irrelevant to much of scientific enquiry" (Box 1995). Many statisticians reluctantly agree with these remarks. In contrast, many statisticians agree that the field of statistics is a fundamental tool of the scientific method which, in turn, plays a key role in modern society. Thus rather than being a worst course and possibly irrelevant, the introductory statistics course ought to be a friendly introduction to the simplicity, beauty, and truth of the scientific method. Teachers must therefore reshape the introductory course. Many teachers have already contributed to the reshaping, as noted below. This paper proposes further changes. I focus on the introductory statistics course for students who are
Section 2 recommends two goals for the introductory statistics course. Section 3 proposes a sequence of five concepts for discussion at the beginning of an introductory course. Section 4 argues that the five concepts provide a broad and deep foundation on which to build the field of statistics. Section 5 reports on tests of the approach in three courses. Section 6 discusses considerations for teachers wishing to use the approach, and section 7 gives a summary. |

Emphasizing the goals of any undertaking helps us to define and
focus on I have observed goal-setting exercises in which the goals were given
much attention for a brief period and then forgotten--a waste of a
valuable resource. Since a teacher's chosen goals are (by the
teacher's own definition) what is most important, I recommend that
teachers regularly revisit their course goals to ask (
In recommending goals for the introductory statistics course I refer to "empirical research", which thus deserves a definition:
Empirical research is a key step of the scientific method, which is crucial in many areas of human endeavor, such as in science, education, business, industry, and government. (No statement of fact in any branch of science is accepted until it has been verified through careful empirical research.)
Many introductory statistics courses have what can be called "topic-based" goals. A teacher using such goals does not specify general goals, but instead simply specifies a list of statistical topics to be covered in the course--that is, the teacher specifies a syllabus. For example, a teacher of a traditional introductory course might aim to cover (in specified amounts of detail) the topics of probability theory, distribution theory, point and interval estimation, and so on. Similarly, a teacher of an activity-based course might make a list of statistical topics and then assign various activities to the students in order to cover the topics. Unfortunately, topic-based goals have a serious drawback: By
emphasizing lower-level statistical
I recommend that the goals of an introductory statistics course be - to give students a lasting appreciation of the vital role of the field of statistics in empirical research and
- to teach students to understand and use some useful statistical methods in empirical research.
These goals imply a commitment to the (We must also discuss generalizations that link statistical ideas together, but [to maximize appreciation] practical applications deserve strong emphasis. More on generalizations below.) The first goal refers to the Other writers who discuss goals for the introductory statistics course include Hogg (1990, 1991), Chromiak, Hoefler, Rossman, and Tesman (1992), Cobb (1992), Iversen (1992), Watkins, Burrill, Landwehr, and Scheaffer (1992), Hoerl and Snee (1995), Gal and Garfield (1997a, pp. 2 - 5), and Moore (1997a).
To help define the role of statistics, let us consider a sequence of five concepts I recommend for discussion at the beginning of an introductory statistics course. To make the approach easy for teachers to use, I present each concept mainly as a highly condensed version of how it might be presented to students. (In this paper I discuss the five concepts mostly in abstract terms because this gives readers a compact overview of the recommended approach. The abstractness of my discussion has led readers of an earlier version of the paper to criticize the approach as being too abstract for students to understand [Macnaughton 1998a, app. C]. Please note that I am not suggesting that we present the concepts to students in the compact form in which I describe them below. Instead, I recommend that the concepts be presented in terms of numerous practical examples. I further discuss presentation issues in section 6 and I illustrate how the ideas might be presented to students in a paper [1996a].) Let us begin with what may be the most fundamental concept of human reality.
If you stop and observe your train of thought at this moment, you
will probably agree that you think about Many different types of entities exist, for example - organisms (e.g., people)
- inanimate physical objects
- events
- processes
- ideas, emotions
- societal entities (e.g., educational institutions, governments, businesses)
- symbols
- forces
- waves
- mathematical entities (e.g., sets, numbers, vectors).
Entities are fundamental units of human reality because everything in human reality is an entity. People usually view entities as existing in two different places: in the external world and in our minds. We use the entities in our minds mainly to stand for entities in the external world, much as we use a map to stand for its territory. All sane people learn to use the concept of 'entity' when they are infants. We use the concept unconsciously as a way of organizing the multitude of stimuli that enter our minds minute by minute when we are awake. Infants recognize that entities can be grouped into "types". For example, they recognize that the entities "mother" and "father" and other similar entities in the external world (e.g., siblings) have heads with two eyes, a mouth, and usually hair on top. Infants group these entities into a type--the type we call "people". Similarly, infants learn to group all inanimate physical objects (beginning with small familiar objects) into a type. Infants (unconsciously) recognize that all the entities of a given type, although each is unique, have many things in common, as I discuss further in the next subsection. Since everything in human reality is an entity, the concept of
'entity' pervades human thought. However, the concept is rarely
present in our At those times when entities Because people use the concept of 'entity' almost entirely at an unconscious level, some people have difficulty grasping the fundamental role the concept plays in their thought. Appendix A further discusses the priority of the concept of 'entity' in human thought, statistics, and science.
Every entity has associated with it a set of attributes or For any particular entity, each of its properties has a If we need to I note above that people group entities into types, such as "human
beings" and "inanimate physical objects". We
(unconsciously) view all entities of any given type as having Empirical researchers and statisticians usually refer to properties
of entities as A Interestingly, the concept of 'variable' is so ubiquitous and fundamental in statistics that some introductory courses and textbooks take the concept almost completely for granted, with little formal discussion of it. By assuming students have a satisfactory understanding of the fundamental concept of 'variable', these approaches befuddle a significant proportion of students right from the start. I discuss the concept of 'variable' further in the paper for students (1996a) and in a Usenet post (1996b).
An important goal of empirical research is to discover how to It appears that Society supports empirical research in seeking the ability to predict and control the values of variables because instances of such ability often provide substantial social or commercial benefits. For example, if medical researchers can discover how to better predict and control a person's propensity to heart attacks, this discovery will provide the social benefit of saving lives. (From this point on in this paper I emphasize the concept of 'variable' and I refer less often to the closely related concept of 'property of an entity'. It is important to emphasize the concept of 'variable' because this concept is used throughout empirical research. This leads to an obvious question: "Why not discuss variables right from the start--is the extra concept of 'property' necessary if it is only to be supplanted by the concept of 'variable'?" My answer is that the concept of 'property' [or, equivalently, 'attribute' or 'characteristic'] is much more basic and intuitive in students' minds than the concept of 'variable'. Thus students understand the concept of 'variable' better if we carefully build it atop the intuitive foundational concept of 'property of an entity'.)
Given the goal of predicting and controlling the values of variables, an obvious question is How can we predict and control the values of variables? The main answer is We can predict and control the values of variables by studying By a "relationship between variables" I mean the standard
idea that one variable (called the Almost all prediction and control in all areas of empirical research is done on the basis of relationships between variables. For example, medical researchers have discovered that a relationship exists between the amount of fat ingested by a person and the probability that a person will have a heart attack. This relationship enables doctors and patients to predict and control heart attacks. We can help students to appreciate the pervasiveness of the concept of a relationship between variables by discussing numerous practical examples of relationships. For example, teachers and students can discuss whether a relationship exists between - the weight of a car and the gas mileage of a car (engineering)
- the education of a person and the income of a person (sociology)
- the net amount of force applied to a physical object and the rate of acceleration of a physical object (physics)
- the concentration of alcohol in the bloodstream of a person driving a car and the probability that a person driving a car will be involved in an accident (physiology)
- the number of hours of homework completed by a student during a course and the grade obtained by a student in a course (educational psychology)
- the style of management of a company and the amount of profit of a company (business)
- having a black cat cross one's path and having bad luck (folk beliefs).
Each of the above examples identifies a possible relationship between variables. Each of these relationships (and relationships between any other pairs or larger sets of variables) can be studied in an empirical research project. Each research project can be viewed in terms of - a population of entities
- a sample of entities that is selected from the population
- a response variable that is measured in each entity in the sample
- one or more predictor variables that are also measured or controlled in each entity in the sample (or possibly are measured or controlled in each entity's environment)
- (most importantly) a relationship in the entities in the population between the response variable and the predictor variable(s) that is sought or studied in the research project.
Typically we ask three main questions in an empirical research project, which are - Is there a relationship between the response variable and the predictor variable(s) in the entities in the population?
- If there is a relationship, how can we best predict or control the values of the response variable in new entities from the population on the basis of the relationship?
- How accurate will the prediction or control be?
By considering a broad set of practical examples, we can show students that most empirical (including most scientific) research projects can be easily and usefully viewed in terms of the five points and three questions given above. That is, most empirical research projects can be viewed as studies of relationships between variables, with the aim being to develop the ability to predict or control the values of the response variable in new entities from the population. This approach provides a simple yet comprehensive model of most empirical research. (I provide support for the claims in the preceding paragraph in appendix B.) I recommend that students' initial sense of the concept of a relationship between two variables be in terms of - the value of one variable in entities "depends" on the value of another variable and
- as the value of one variable changes in entities, the value of the other variable changes somewhat in step with the values of the first variable.
* * * In addition to showing students that the concept of a relationship between variables plays a central role in empirical research, we can also show mathematically minded students how the concept plays a central role on the theoretical side of science. Specifically, mathematical equations are crucial on the theoretical side of many branches of science. But most (all?) mathematical equations in science (as opposed to equations in pure mathematics) are simply statements of known or hypothesized relationships between variables (relationships between properties of entities). In the initial discussion of concepts 1 - 4 with students, I
recommend that a teacher Subsection 6.5 further discusses the use of practical examples in an introductory statistics course. The paper for students gives a formal definition of the concept of a relationship between variables (1996a, sec. 7.10).
Once students properly understand and appreciate the usefulness of relationships between variables as a means to prediction and control, we can then bring the field of statistics out onto the stage. We can characterize the field as a set of optimal techniques to help empirical researchers study variables and relationships between variables as a means to accurate prediction and control. When first introducing the field of statistics, it is helpful to classify statistical techniques into four functional groups - techniques for
*detecting*relationships between variables - techniques for
*illustrating*relationships between variables - techniques for
*predicting*and*controlling*the values of variables on the basis of relationships between variables, and - miscellaneous techniques for the study of variables and relationships between variables.
The field of statistics provides many methods for carrying out these four groups of techniques, with the choice of the best method(s) depending on the particular empirical research situation at hand. I discuss some of these methods in subsection 4.3 and in appendix C. I further discuss the four groups of techniques in the paper for students (1996a, sec. 8 - 13). After introducing the four groups of techniques, I recommend that a teacher spend the rest of the course and subsequent courses introducing standard statistical principles and methods in terms of the techniques. I propose a syllabus for a course following this approach in subsection 6.4.
The preceding subsections describe a sequence of five concepts for discussion at the beginning of an introductory statistics course. The concepts are - entities
- properties of entities (variables)
- an important goal of empirical research: to predict and control the values of variables
- relationships between variables as a key to prediction and control
- statistical techniques for studying relationships between variables as a means to accurate prediction and control.
After introducing the five concepts, the teacher focuses the rest of the course on statistical techniques for studying variables and relationships between variables. I call this approach to the introductory statistics course the "entity-property-relationship" (EPR) approach. I discuss evaluation of the approach in section 4, tests of the approach in section 5, and implementation considerations in section 6.
The entity-property-relationship approach to the introductory statistics course has several important features and benefits, which I discuss in this section. (Some of the material in this section is abstract and is presented solely to assist in evaluating the EPR approach. I do not intend that this abstract material be discussed in an introductory course.)
The concepts of entities, properties (variables), and relationships are ubiquitous in students' (unconscious) thought. Therefore, if we carefully discuss these concepts (with sufficient practical examples), students find the concepts easy to understand. The ease of understanding leads me to conjecture that the concepts of entities, properties, variables, and relationships can be taught at all levels of teaching statistics from late elementary school up, with only the teaching time and depth of coverage of the concepts varying at different levels.
The second of the five concepts in section 3 introduces the
fundamental concept of 'variable' in terms of the concepts of 'entity'
and 'property'. The fourth concept introduces the fundamental idea of
a relationship between properties (relationship between variables). It
is easy to see that several other fundamental statistical concepts are
built atop the concepts of 'entity', 'property', or 'relationship' *set:*a fundamental type of entity in human reality that consists of zero or more entities of a specified type*population:*a "large" set of entities upon which interest is focused*sample:*a set of entities selected from a population, or a set of values of properties of entities selected from a population*event:*a fundamental type of entity in human reality*probability:*a particular property of an event that reflects how often the event occurs or is thought likely to occur*distribution:*a mathematical entity that succinctly describes in probabilistic terms what the different values of a property are in the different entities in a population or sample*model (equation):*a mathematical statement of a relationship between properties of entities*parameter:*a general name for a property of a population, distribution, or model equation (or for the*value*of such a property)*estimation*(of the value of a property or the value of a parameter)*hypothesis*([*a*] of the existence of a relationship between properties or [*b*] about the value of a parameter)*statistical test:*a technique for providing an objective measure of the weight of a body of evidence in support of a hypothesis*statistic:*any of various well-defined properties of a sample whose values are obtained by performing mathematical operations on the values of one or more properties of the entities in the sample; often used to estimate the values of parameters or in performing statistical tests.
Note that each of the fundamental statistical concepts in the above list is built atop the concepts of 'entity', 'property', or 'relationship', or is built atop concepts that are themselves built atop the three concepts. Furthermore, the concepts of entities, properties, and relationships appear to be among the most basic concepts of human reality. Thus the EPR approach provides a broad and deep foundation for discussion of statistical concepts.
The discussion of concept 5 in section 3 identifies four groups of techniques that statistical methods can perform to help empirical researchers study relationships between variables, namely - techniques for
*detecting*relationships between variables - techniques for
*illustrating*relationships between variables - techniques for
*predicting*and*controlling*the values of variables on the basis of relationships between variables, and - miscellaneous techniques for the study of variables and relationships between or among variables.
This raises the question: Which of the currently available statistical methods can actually perform these four groups of techniques? The following nineteen statistical methods can perform one or more of the foregoing four groups of techniques: - general linear model (
*t*-test, analysis of variance, linear regression, multiple comparison methods, hierarchical methods, variance components analysis, multivariate analysis of variance, multivariate linear regression) - generalized linear model
- response surface methods
- exploratory data analysis
- time series analysis
- survey analysis
- survival analysis
- categorical analysis
- graphical methods
- Bayesian methods
- nonlinear regression
- neural networks
- discriminant analysis
- nonparametric methods
- probit analysis
- logistic regression
- correlation analysis
- structural and path analysis
- univariate analysis.
Upon consideration, many statisticians will agree that the above
list of statistical methods contains almost all of the currently
popular statistical methods. Many statisticians will also agree that
the (Furthermore, statistical methods can usually be characterized as
performing only the Since the nineteen statistical methods in the list are fully
explained (at a high level) by the four groups of statistical
techniques that are emphasized in the EPR approach, the approach
unifies the main statistical methods. That is, we can
I discuss the close relationship between the EPR approach and scientific "explanation" in a Usenet post (1997).
The preceding three subsections respectively suggest that the concepts of entities, properties, variables, and relationships between variables are - a foundation for several of the fundamental statistical concepts
- central to many of the main statistical methods and
- central to scientific explanation.
This suggests that the concepts of the EPR approach are more fundamental than many of the other concepts that are traditionally discussed in statistics courses.
Concepts in a body of knowledge are usually easiest to understand
and remember if they are developed in a logical order beginning with
the most fundamental. This is especially true if the fundamental
concepts are intuitive, as are the concepts of entities, properties,
variables, and relationships. Therefore, I recommend that teachers
cover the concepts of entities, properties, variables, and
relationships
In subsection 2.4 I recommend that the first goal of the
introductory statistics course be to give students a lasting
appreciation of the vital role of the field of statistics in empirical
research. What The main
Consider: - the EPR approach is aimed specifically at satisfying a main goal of empirical research--the goal of accurate prediction and control
- most students are directly interested in prediction and control (of variables of interest to them)
- the concept of prediction and control on the basis of relationships between variables is not difficult to understand
- the approach demonstrates that the field of statistics plays a broad role across all empirical research
- the approach is developed in a logical sequence from fundamental concepts (with numerous practical examples).
These points suggest that the EPR approach gives students a lasting appreciation of the field of statistics and its vital role in empirical research.
The concepts of entities, properties, and relationships are not new. Indeed, researchers and statisticians use these concepts implicitly throughout their thinking and discussion. Surprisingly, however, the fundamental concepts of entities, properties, variables, and relationships are almost never carefully discussed in a unified approach in introductory statistics courses. I believe that the unfortunate omission of careful unified discussion of these concepts is the main reason why the field of statistics is so widely misunderstood. (Some leaders in statistical education have already independently adopted an important aspect of the EPR approach in that they emphasize relationships between variables in their introductory courses. For example, using an idea developed by Gudmund Iversen, George Cobb teaches two introductory courses, both of which start with relationships--one devoted to experimental design and applied analysis of variance and the other devoted to applied regression [G. Cobb, personal communication, August 21, 1996]. Similarly, Robin Lock teaches an introductory course devoted to time series analysis--i.e., methods for studying relationships between variables when an important predictor variable is "time" [Cobb 1993, sec. 3.1].)
Several helpful new approaches to teaching the introductory
statistics course have recently been proposed. As suggested by Moore
(1997a), these approaches fall neatly into two distinct groups: Under the - emphasis on data analysis and de-emphasis of theoretical concepts (especially de-emphasis of probability theory, distribution theory, and the theory of statistical tests) (Cobb 1992; Moore 1992a, 1992b)
- emphasis on exploratory data analysis and de-emphasis of confirmatory data analysis (Tukey 1977; Velleman and Hoaglin 1992)
- emphasis on statistical reasoning and de-emphasis of statistical methods (computations) (Ruberg 1990; Bradstreet 1996)
- emphasis on the Bayesian approach to statistics (usually restricted to more mathematically literate students) (Blackwell 1969; DeGroot 1986; Albert 1996, 1997; Berry 1996, 1997; Berry and Lindgren 1996; Antelman 1997; Moore 1997b, 1997c)
- emphasis on (
*a*) process (which is a particular type of entity) and (*b*) the minimization of the variation in selected properties of a process, mainly through relationships between variables, generally with less emphasis on formal statistical methods (Snee 1993; Hoerl and Snee 1995; Britz, Emerling, Hare, Hoerl, and Shade 1997) - emphasis on nonparametric statistical methods (Iman and Conover 1983; Noether 1992)
- emphasis on resampling (Boomsma and Molenaar 1991; Simon and Bruce 1991; Smith and Gelfand 1992; Wonnacott 1992; Albert 1993; Simon 1993, 1994; Willemain 1994; Pollack, Fireworker, and Borenstein 1995; Hesterberg 1998)
- emphasis on probability (Falk and Konold 1992).
In contrast to the conceptual approaches, the - more interaction between the teacher and the students instead of straight lectures (Mosteller 1988; Zahn 1994)
- application of the principles of Total Quality Management to the design and management of the course (Hogg and Hogg 1995; Wild 1995)
- use of multimedia, film, or video, to teach the concepts (Moore 1993; Ottaviani 1996; Velleman and Moore 1996; Cobb 1997; Cryer and Cobb 1997; Doane, Mathieson, and Tracy 1997; Newton and Harvill (1997); Macnaughton 1998b; Velleman 1998)
- use of demonstrations, activities, or projects to teach the concepts (Jowett and Davies 1960; Hunter 1977; Hansen 1980; Carlson 1989; Bisgaard 1991; Halvorsen and Moore 1991; Bryce 1992; McKenzie 1992; Roberts 1992; Sylwester and Mee 1992; Zahn 1992; Gunter 1993, 1996; Garfield 1995; Magel 1996; Rossman 1996; Scheaffer, Gnanadesikan, Watkins, and Witmer 1996)
- use of students working in groups instead of working individually (Dietz 1993; Garfield 1995)
- use of cases to teach the concepts (Chatterjee, Handcock, and Simonoff 1995; Czitrom and Spagon 1997; Parr and Smith 1998; Peck, Haugh, and Goodman 1998)
- use of news stories to teach the concepts (Snell and Finn 1992; Snell 1999)
- use of improved methods for assessing students (Eltinge 1992; Cobb 1993; Garfield 1994; Gal and Garfield 1997b; Goldman, McKenzie, and Sevin 1997)
- use of concept maps to teach the concepts (Schau and Mattern 1997a, 1997b)
- use of a computerized library of examples that students can analyze to aid in teaching the concepts (Hand, Daly, Lunn, McConway, and Ostrowski 1994; Velleman, Hutcheson, Meyer, and Walker 1996; Pearl, Notz, and Stasny 1996; OzData 1999; StatLib 1999)
- use of real (or at least realistic) data in examples (as opposed to data that have no obvious practical implications) (Hunter 1977; Cobb 1987; Moore and Roberts 1989; Willett and Singer 1992; Macnaughton 1998a)
- emphasis on improving students' writing ability
- use of computers or calculators to generate data displays, to illustrate and simulate statistical ideas, and to perform statistical analyses.
Many introductory statistics teachers now use some combination of the above conceptual and pedagogical approaches. (The main disagreement among teachers is only about the relative emphasis that each approach deserves.) (It is possible to classify the use of multimedia, film, video, computers, and calculators as "technological" approaches, rather than as "pedagogical" approaches. However, it seems more reasonable to view technology as a means to better pedagogy rather than as an end in itself.) There is a simple relationship between the EPR conceptual approach to the introductory statistics course and the other approaches--the EPR approach can be effectively used in conjunction with any (or any group) of them. Moore (1997a) reviews several of the new approaches to statistical education. Cox (1998) comments on some general aspects of statistical education. Gordon and Gordon (1992) and Hoaglin and Moore (1992) give papers by leading statistics educators about teaching statistics. Hawkins, Jolliffe, and Glickman (1992) give a general discussion of teaching statistical concepts.
I discuss several insightful criticisms of the EPR approach in a paper (1998a, app. C - G) and in some Usenet posts (1998c).
Three teachers have tested the entity-property-relationship approach in their statistics courses using a draft textbook (Macnaughton 1986). They commented that ... students found the book enjoyable and easy to understand. Using a unique approach, Macnaughton has provided a comprehensive first-rate introduction to the material. I would highly recommend the book for use in introductory statistics courses .... - ... students obtained a good understanding of the basic principles of statistical analysis. ... [the approach] substantially simplifies the material without sacrificing important concepts. - The absence of overt mathematics enables the underlying principles of scientific research ... to be more directly apprehended by persons who have ... weak grounding in mathematics. ... ... Students' comments have been uniformly favorable .... ... the book is to be commended to the instructor.
Until textbooks based on the EPR approach become generally available, a teacher wishing to use the approach in an introductory course can use the paper for students (1996a) to reinforce class discussion of the five introductory concepts. As noted above, after covering the five concepts, I recommend that the teacher spend the rest of the course covering standard statistical principles and methods in terms of the concepts. Some considerations for implementing the approach follow.
The first day of class is important because it can set a positive attitude toward the course in students' (and teachers') minds. What should be the very first statistical idea we introduce to students? I recommend that the first idea be that students are going to learn how to make accurate predictions. We can promise students they will learn how to make accurate predictions using scientific methods that are recognized throughout science and empirical research as being the very best methods available. For example, we can promise students they will learn how to accurately (but generally not perfectly) predict - the mark they will get on the final
- their average annual income over the next several years
- their longevity
- whether it will rain tomorrow
- just about anything else of interest (if it can be reliably measured).
(Along with prediction methods, I recommend that the introductory course devote substantial attention to the more complicated methods of exercising accurate control through formal experimentation. However, for simplicity, I recommend that discussion of control and experimentation be omitted at the very beginning--the promise of accurate predictions seems quite enough to engage students. I discuss experimentation further in subsection 6.4.) If we promise students on the first day of class that they will learn how to make accurate predictions, we arouse their curiosity and we set the stage for development of the five concepts discussed in section 3. The promise also sets the course in a practical direction, which is more likely to impress most students than if we begin with mathematical discussion. If we promise students on the first day of class that they will
learn how to make accurate predictions, we must (for the course to be
successful) later deliver on the promise. In particular, the
intelligent student will be interested in whether we can demonstrate
Depending on the level of the students, the five concepts I discuss in section 3 can be properly introduced in one to eight class sessions. I recommend in section 3 that after covering the five concepts the teacher spend the rest of the course expanding concept 5 (statistical techniques for studying relationships between variables) by covering various standard topics selected from the field of statistics. (I recommend covering topics that are more frequently used in empirical research first.) The next two subsections discuss two ways of implementing the approach. |

Perhaps the easiest way to implement the EPR approach is to follow discussion of the five concepts with material selected from an already existing introductory statistics course. The teacher can use the five concepts to introduce and unify the material. This enables the teacher to use the EPR approach in conjunction with an already-existing course with only a minimum amount of modification to the course.
Another more unified way of implementing the EPR approach is to break the course into five phases: an introductory phase, a practical-experience phase, a generalization phase, a specific-methods phase (optional), and a mathematics phase (also optional).
I recommend that the teacher begin the practical-experience phase with discussion of a commonly occurring simple type research project--the observational research project that studies the relationship between two continuous variables. Possibly using the material in the paper for students (1996a) as an introduction, the teacher can discuss how to design an observational research project to study the relationship between two continuous variables, how to use statistical techniques to analyze data from such a research project to determine if a relationship is present between the variables, how to use scatterplots to illustrate such a relationship, and how to use the model equation derived from such a relationship to make predictions. To reinforce the discussion, I recommend that students be given computer assignments to detect and study (practical) relationships between pairs of continuous variables in various sets of data. If time permits, the bivariate case can be extended to the multiple regression case. Next, the teacher can discuss "experiments" and the
associated statistical methods as a powerful tool for studying If time permits, the fully randomized one-way case can be extended to the multi-way case, repeated measurements, blocking, analysis of covariance, and so on. I recommend that the length of the practical-experience phase be adjusted to allow enough time for the teacher to properly cover the material in the next (generalization) phase.
*Types of Variables.*This topic introduces students to a standard typology for variables, with four mutually exclusive and exhaustive categories: continuous, discrete-ordinal, discrete-nominal, and binary. (This topic is in preparation for the following topic.)*Overview of Statistical Methods.*This topic provides a high-level overview of statistical methods (such as those listed in subsection 4.3 and appendix C), with discussion of the conditions under which each method is applicable. Most of the nineteen methods listed in subsection 4.3 are best summarized in a table, with rows of the table indexing the four possible types of the response variable and with columns indexing the four possible types of the predictor variable(s) in the research. Each cell in the body of the table contains the names of the statistical methods that may be used when the response variable and predictor variables are of the indicated two types. I recommend that students*not*be required to memorize the table, but rather the table be provided as an aid to them if they later need to choose the appropriate method to analyze the results of an empirical research project.*Underlying Assumptions of Statistical Methods.*This topic discusses how every statistical method is based on certain assumptions (which are different for different methods), and why therefore (to avoid later possible embarrassment) a researcher should always verify that the underlying assumptions are adequately satisfied before attempting to draw conclusions from the use of a statistical method.*Design of Empirical Research Projects.*This topic discusses key considerations in the design of empirical research projects, with emphasis on (*a*) the importance of eliminating possible alternative explanations of the results (Macnaughton 1986), (*b*) the importance of maximizing the power of the statistical tests (appendix B in the paper for students [1996a]), and (*c*) the importance of maximizing prediction and control accuracy (appendix E in the paper for students [1996a]).*Statistical Thinking.*This topic gives students practice in understanding, criticizing, and writing reports of empirical research.
In discussing topics 2, 3, and 4 above I recommend that the teacher
For each method in subsection 4.3, I recommend that the following topics be covered (when applicable): *Design:*how to design a research project to study relationships between variables using the method*Power:*how to use a statistical package to compute the power of statistical tests for detecting relationships between variables using the method*Data Checking:*how to use a statistical package to examine the data from a research project using the method in order to identify and (when appropriate) correct anomalies in the data prior to analysis (using methods for studying univariate and possibly bivariate distributions of the values of variables)*Assumptions:*how to analyze the description and results of a research project in order to determine whether the underlying assumptions of the method are sufficiently satisfied to permit drawing conclusions from the use of the method*Detection:*how to use a statistical package to analyze the results of a research project in order to detect relationships between variables using the method*Illustration:*how to use a statistical package to illustrate relationships between variables using the method*Prediction and Control:*how to use a statistical package to analyze the results of a research project in order to derive model equations for relationships between variables using the method, as an aid to prediction or control*Reporting:*how to write reports of empirical research projects that use the method.
(A similar set of topics can be used to discuss the statistical methods listed in appendix C.) Except for statistics or mathematics majors, I recommend that the use of mathematics be avoided in the specific-methods phase. Instead, I recommend that attention be focused on designing research projects and on correctly interpreting the relevant output from a statistical package.
As noted in section 3, and following Hunter (1977, pp. 16-17) and
Willett and Singer (1992, p. 91), I recommend that any implementation
of the EPR approach discuss each important concept in terms of
numerous I believe that the most important examples in an introductory statistics course are the examples of empirical research projects. For an example of an empirical research project to be judged "practical" I recommend (following Scheaffer 1992, p. 69) that the following question have an affirmative answer: Does an understanding of the relationship between variables under study in the example have an obvious significant social or commercial benefit? That is, does the example provide some clear basis for action? Examples of empirical research projects that satisfy this practicality criterion are easy to find in most fields of empirical research. For example, research in medicine to study the relationship between AIDS symptoms and a new treatment for AIDS is clearly very practical according to the criterion. That is, AIDS research (if it finds a new relationship between relevant variables) provides a clear basis for action in the treatment of AIDS. Similarly, research to study relationships between variables that help make computers more efficient or less expensive is also (in a commercial sense) very practical since (if it finds a new relevant relationship) it provides a clear basis for action in manufacturing computers. Examples that fail to satisfy the practicality criterion seriously detract from the field of statistics because they associate the field with problems that appear to be frivolous (or at best inconsequential). For example, a research project that studies the relationship between people's forearm lengths and their foot lengths is a "frivolous" research project, since students can see no obvious practical use of knowledge of this relationship. (Interestingly, if one looks hard enough, there are practical uses of most relationships between variables. For example, the relationship between forearm length and foot length is of interest in orthopedics and physical anthropology. However, a somewhat complicated explanation is needed before students can see the practicality of the relationship in either of those fields. Most students are unimpressed by such complicated and obscure applications.) If the students in a particular introductory statistics course are all majoring in the same discipline, and if that discipline performs empirical research, we can almost certainly make the greatest impression on these students if we choose significant practical examples of empirical research projects from that discipline. We can also impress students if we choose practical examples of research projects that use response variables that students themselves are directly interested in predicting and controlling, such as variables reflecting student grades, student health, student happiness, student expenses, and student income. It is surprising how many examples of empirical research projects in
statistics textbooks are not practical. And when one studies such
examples and asks "Would an enlightened empirical researcher
every actually (The frequent use of impractical examples by some statistics
textbook writers is one reason for insisting that teachers and
textbook writers use Unfortunately, not all examples of empirical research projects used
in an introductory course can be practical--particularly examples in
activities and projects where, for example, it is useful for students
to experiment with paper helicopters because real helicopters (or
other reasonable entities for study) are too expensive and unwieldy
(Rogers 1986, Box 1992, Santy and Einwalter 1997). When an example
cannot be practical, I recommend that the teacher carefully show
students how the example relates to other examples that I discuss the use of examples further in a paper (1998a, sec. 6).
Once a concept has been introduced and studied through a sufficient number of practical examples, I recommend that the teacher cement the appropriate generalizations about the concept in students' minds. This helps students to use the concept in new situations. For example, once students understand (through sufficient practical examples) the concept of a relationship between variables, the teacher can make the generalization that most empirical research projects can be viewed as studying relationships between variables. After stating a generalization, I recommend that the teacher assign exercises in which students identify details of the generalization in specific instances. For example, after discussing the generalization that most empirical research projects can be viewed as studying relationships between variables, I recommend that the teacher assign exercises in which students answer the following questions about various empirical research projects: - What is the population of entities under study?
- What is the size of the sample and how was the sample selected from the population?
- What is the response variable that was measured in each entity in the sample?
- What is (are) the predictor variable(s) that was (were) measured in each entity in the sample (or measured in each entity's environment)?
- In plain language or in graphical terms (as opposed to mathematical terms), what is the relationship between the response variable and the predictor variable(s) that was sought, discovered, or studied in the research project?
Such exercises are important because once students have interpreted a sufficient number of diverse instances as easily fitting within a generalization, they recognize the unifying power of the generalization. In particular, once students have interpreted a sufficient number of diverse research projects as studies of relationships between variables, they recognize that most empirical research projects can be easily interpreted from this comprehensive simplifying point of view.
How many explanations, examples, exercises, or activities should a teacher provide or assign to ensure that students understand a particular generalization? This depends, of course, on the generalization and on the nature of the students and is often difficult to determine at the front line of teaching--especially if a teacher is using a new approach. To reduce this difficulty, I recommend that teachers use feedback systems to assess whether students understand each main concept and generalization. Some effective feedback systems for assessing students' understanding are - minute papers (which are ungraded and may be anonymous) in which students briefly report their understanding, their questions, and the muddiest point to help the teacher evaluate a lecture or lesson (Mosteller 1988)
- graded quizzes and exercises that test students' understanding of the ideas (I give examples in the paper for students [1996a])
- two-way discussions between the teacher and students about the ideas.
Gal and Garfield (1997b, pt. 2) give four interesting essays by statistics educators about assessing students' understanding of statistical ideas.
As suggested in subsection 6.4, except in courses aimed at
statistics or mathematics majors, I recommend that discussion of the
underlying mathematics of statistics (e.g., probability theory,
distribution theory, theory of statistical tests) be omitted from the
introductory statistics course. This recommendation is motivated by
the needs of the typical user of statistical methods, who is
interested in the field of statistics only to the extent that it can
help him or her to detect and study relationships between variables
(or perform equivalent functions under another name). And like the
typical automobile driver who needs transportation, but who cares
little about the mechanical details of the engine, the typical user of
statistical methods needs help studying relationships between
variables, but cares little about the mathematical details of the
help. Instead, the user's attention is directed toward the substantive
area of empirical research within which he or she is working (e.g., a
particular branch of medicine). Thus the less we engage (and confuse)
potential users with the complicated mathematical details of
statistical methods, and the more we teach them how to properly The paper for students (1996a) illustrates one approach to showing due deference to the underlying mathematics without getting immersed in complicated details.
Traditionally, the introductory statistics teacher spends a substantial amount of time near the beginning of the course covering the topic of univariate distributions of the values of variables. (The coverage generally includes ways of summarizing and illustrating univariate distributions and may also include the mathematics of univariate distributions.) Since many statistical ideas depend on the concept of a univariate distribution, it is clearly mandatory to cover this topic at some point in students' statistical careers--but where? Except in courses for statistics majors, I recommend that discussion
of univariate distributions be (Nor is the topic of univariate distributions If we omit univariate distributions at the (The concepts of univariate distributions are especially helpful in understanding statistical power, in checking data for anomalies, in examining data to determine whether the underlying assumptions of a statistical method are satisfied enough to justify the use of the method, and in specifying the estimated accuracy of predictions or control based on a model derived from empirical research.) I discuss the treatment of univariate distributions further in a paper (1998a, sec. 9.1 and app. G) and in some Usenet posts (1998c).
Appendix D discusses future computer systems for studying relationships between variables. Such systems will make it substantially easier for teachers to convey statistical concepts to students.
Under the entity-property-relationship approach, we present the following five concepts to students at the beginning of the introductory statistics course: - entities
- properties of entities (variables)
- an important goal of empirical research: to predict and control the values of variables
- relationships between variables as a key to prediction and control
- statistical techniques for studying relationships between variables as a means to accurate prediction and control.
To facilitate understanding, the concepts are presented to students in terms of numerous practical examples. After students have learned the five concepts, standard statistical principles and methods are developed in terms of the concepts, again with emphasis on practical examples. The EPR approach is broad, and the concepts of the approach are fundamental. The approach gives students a lasting appreciation the vital role of the field of statistics in empirical research.
We can see further evidence that the concept of 'entity' pervades human thought by considering the role of nouns in human speech. All nouns are simply names for entities (some of which exist in the external world and some of which do not). Since most human sentences contain at least one noun, most human sentences (thoughts) refer directly to entities. An alternative approach to using the concept of 'entity' is to use an entity-less fog of unattached properties, but this approach seems much less viable, and perhaps impossible because it is generally necessary to link individual values of various properties together in an analysis. It is the concept of 'entity' that does the linking. Since we cannot easily abandon the concept of 'entity'
Some empirical researchers and statisticians are unaware that the
concept of a relationship between variables can be used as widely as
this paper claims. (Probably more than one-half of all empirical
research projects are The remainder of this appendix discusses four aspects of empirical
research projects that may at first appear
Suppose a physics or chemistry experiment has discovered that if ingredients A and B are mixed together under conditions C, then ingredient D appears. Does this experiment study a relationship between variables? Yes. The predictor variables are "amount of ingredient A", "amount of ingredient B", and the variables that reflect the conditions C. The response variable is "amount of ingredient D that is produced". Viewing the experiment in terms of a relationship between variables has the benefit of reminding us that in addition to being interested in the fact that ingredient D is produced, we also wish to know how much of ingredient D is produced for given values of A, B, and C. This knowledge gives us better prediction or control capability of D. (The population of entities in the example is all the cases or instances ever in which the ingredients A and B are brought together under conditions C, and the sample is the set of those instances that occur in the experiment.)
Some readers may wonder whether a research project that performs "parameter
estimation" can be viewed as studying a relationship between
variables. One type of parameter-estimation research project is
intended to determine (i.e., estimate) the values of one or more "population
parameters", where a population parameter is simply some property
of a population, such as the average of the values of some property of
the entities in the population. (In more general terms, the procedure
is studying the univariate distribution of the parameter.) Any
standard procedure for estimating the value of a population parameter
can be usefully viewed as the study of a degenerate case of a
relationship between variables in which the response variable is
present, but Note that if we adopt the foregoing point of view of parameter
estimation, we are naturally led to ask in particular cases A second type of parameter-estimation research project can be performed to determine (i.e., estimate) the values of one or more parameters of a model (equation) of a relationship between variables. Because model equations are directly related to the study of relationships between variables, this second type of parameter estimation is simply a particular aspect of the study of relationships between variables.
Some readers may wonder whether a research project that performs "interval estimation" can be viewed as studying relationships between variables. Here, given the links just established between parameter estimation and relationships between variables, it is easy to see similar links between interval estimation and relationships between variables. The links occur because the intervals in interval estimation are simply intervals in which we determine (with a stated level of "confidence") that the associated parameters probably lie.
Many instances of the commonly used statistical tests can be
usefully viewed as tests about relationships between variables. For
example, consider a research project that uses a two-sample Similarly, in a research project in which the entities are cross-classified in two or more different ways, each classification (i.e., each subscript or margin) represents a different predictor variable. These predictor variables may reflect properties of the entities that are under study, or they may reflect properties of the entities' environment. Of course, in the case of a particular treatment that is The preceding points together with consideration of the standard
statistical tests suggest that we can view many instances of the
standard tests (e.g., the But even if we agree that we can view many instances of statistical
tests as techniques for detecting evidence of relationships between
variables, a more fundamental question remains: Is it empirically To answer that question, note that it is precisely instances of the
concept of a relationship between variables in entities that empirical
researchers are usually interested in detecting when they use
statistical tests in research projects. That is, (using the definition
of a relationship between variables in subsection 7.10 of the paper
for students [1996a]) researchers are usually precisely interested in
determining whether the expected value in entities of some variable
(Another valid way of viewing some statistical tests is to say that
they are techniques for detecting differences between subpopulations.
Thus in the |

In subsection 4.3 I list nineteen statistical methods and then I make two claims about these methods. This appendix provides support for the claims and discusses related matters. The first claim is that the list of nineteen methods contains almost all of the currently popular statistical methods. This is supported by the fact that if we survey empirical research projects that use statistical methods, we will find that a large majority of research projects use as their main statistical method(s) one or more of the methods in the list. I call the statistical methods in the list "response-variable" methods. My criterion for calling a method a response-variable method is A statistical method is a All the methods in the list can be easily viewed as satisfying this criterion. (I discuss the study of univariate distributions--i.e., the group of
response-variable methods in which a researcher uses a single response
variable and The criterion above states that response-variable methods use a In considering the nineteen response-variable methods the question
arises whether we should speak of relationships "between"
the variables or relationships "among" the variables. I
recommend using the preposition "between" when referring to
relationships studied by the response-variable methods. This is
because in any individual use of one of these methods there is only a
single response variable (which may on rare occasion be a vector)
under study, and the relationship under study is Which statistical methods are The no-response-variable methods are used in total in probably less than one percent of reported empirical research projects that use statistical methods. Thus although the no-response-variable methods are important in a few research projects, I believe they are not important topics for discussion in an introductory statistics course. On the other hand, the correlation coefficient and the chi-square
statistic for association in a contingency table are (implicitly) In exploratory data analysis one is "looking around" in data, often without a particular response variable in mind. Thus the question arises whether we should view exploratory data analysis as a response-variable method or as a no-response-variable method. One answer is that if an exploratory data analysis is to be put to any practical use, a response variable and zero or more predictor variables will usually be (implicitly or explicitly) determined. Thus exploratory data analysis, when it is put to practical use, can usually be viewed as a response-variable method. * * * In subsection 4.3 I list the following four groups of techniques that statistical methods can perform: - techniques for
*detecting*relationships between variables - techniques for
*illustrating*relationships between variables - techniques for
*predicting*and*controlling*the values of variables on the basis of relationships between variables, and - miscellaneous techniques for the study of variables and relationships between or among variables.
Then I claim that the * * * The four groups of techniques listed above are
Nowadays if a researcher wishes to use statistical methods to study relationships between variables, the researcher must be highly skilled in the use of those methods (even though statistical software can do most of the arithmetic). And when unskilled researchers try to use these methods, they often, through misunderstanding, make serious blunders. To prevent these blunders, and to help improve the quality of empirical research, it seems likely that statisticians will develop expert software that will interactively guide an unskilled researcher through the steps of properly designing a research project, performing it, and interpreting the results. I call this type of software "research guidance software". To help researchers design a research project, research guidance software will help them select the response and predictor variables and help them determine the important details of the design. An important goal will be to design research projects - that are unequivocal
- that have minimum cost
- whose statistical tests have maximum power (appendix B in the paper for students [1996a])
- whose subsequent predictions or control will have maximum accuracy (appendix E in the paper for students [1996a]).
To help researchers perform a research project, the software will be capable of controlling the instruments used in the research project and capable of obtaining values of variables directly from the instruments. To help researchers interpret the results of a research project, the software will automatically guide researchers through the phases of detecting relationships between variables, displaying relationships, and prediction or control, as discussed in the paper for students (1996a, sec. 8 - 13). Research guidance software will operate in a hypertext computer system, with automatic real-time generation of custom hypertext and graphics that reflect specific details of the research project under study. (The hypertext will be generated from pre-written templates.) To bring the user interface up to the visual resolution of a printed textbook, the software and video system will be capable of simultaneously displaying the equivalent of at least two full pages of highly legible text and graphics from a standard textbook. In addition, the system will be capable of displaying video segments in which a narrator discusses and illustrates a concept. (As a prolific writer of notes in the margins of the books I read, I recommend that the software provide a convenient way for students to "write notes in the margin" of its output [including output of the help system]. I also recommend that the software be able to preserve these notes and details of their linkages when it is upgraded.) Until recently, most researchers and teaching facilities lacked access to computer hardware that could run the type of system described in the preceding paragraphs. Thus the market for research guidance software was too small to justify the development cost of a comprehensive system. However, nowadays the necessary computer hardware is within most researchers' and many teaching facilities' budgets. Furthermore, it is now easier to develop a research guidance software system because such a system can use an existing statistical package as its data analysis engine. (At least one leading statistical package vendor has developed an "output delivery system" that can provide all output from its statistical procedures in formats that are easily used as input by other programs [SAS Institute 1999].) This enables research guidance software developers to concentrate on other important tasks, such as creating the research-project-design and high-level-analysis modules, and writing the many necessary text templates for the system. The default path through any output from a research guidance software system will be on the highest conceptual level, which will focus the user on the important points. However, customized full information about the underlying details (including tutorials in the associated statistical concepts) will be only a keystroke or two away. Research guidance software will enable researchers to get much closer to their data because, in interpreting the results of a research project, the software will automatically examine many different graphical views of the data and will present the most interesting views (as defined by research in human perception) to the researcher. (Most researchers would be incapable of generating and examining all the necessary views manually.) Research guidance software will be designed by teams of statisticians and editors who, in the past, would have developed statistics textbooks. The quality of the writing, graphics, and layout will be equal to that of a superior textbook. For a research guidance software system to be complete, its developers must codify a complete general research design and data analysis strategy, a difficult but not impossible task. (The codification will involve extensive consultation with experts in research design and data analysis and extensive testing with inexperienced users.) The approach must be general enough and well-enough presented to enable a well-motivated neophyte to properly design, perform, and analyze efficient simple empirical research projects. Rudimentary research guidance software systems have begun to appear. Silvers, Herrmann, Godfrey, Roberts, and Cerys (1994) discuss one such system and give references to important earlier systems. Gale and Pregibon (1984) were the pioneers. The inevitability of useful research guidance software stems from the fact that today's best research project designers and data analysts work by simply proceeding through a complicated (often subconscious, sometimes vague) decision tree. Thus three steps are necessary: - decision trees must be elicited from master designers and data analysts
- decision trees must have any vagueness removed (or perhaps parameterized, with consensually chosen default values for the parameters)
- software must be developed that will (in the manner of a wise and friendly consultant, and using different levels of detail depending on the user's knowledge) guide a researcher through the decision tree.
Once these steps are completed, the ability of research guidance software to help researchers perform research projects will approach (and likely someday surpass) that of an expert. Research guidance software amounts to copying both statistical thought and research thought from experts' minds and from textbooks into the computer. Once captured in the computer, the thought can be automatically customized by the computer for the situation at hand, and the researcher or student can actively interact with the thought. Customized interaction guarantees better understanding. The teaching component will be an important part of any research guidance software system. In the better systems the teaching component will be easy to understand, comprehensive, and compliant with current norms of statistical practice. Meeker presents a similar vision for statistical technology in subsection 2.2 of an article by Moore, Cobb, Garfield, and Meeker (1995).
Albert, J. (1993), "Teaching Bayesian Statistics Using Sampling
Methods and Minitab," Albert, J. (1996), Albert, J. (1997), "Teaching Bayes' Rule: A Data-Oriented
Approach" (with discussion), Antelman, G. (1997), Berry, D. A. (1996), Berry, D. A. (1997), "Teaching Elementary Bayesian Statistics
with Real Applications in Science" (with discussion), Berry, D. A., and Lindgren, B. W. (1996), Bisgaard, S. (1991), "Teaching Statistics to Engineers,"
Boomsma, A., and Molenaar, I. W. (1991), "Resampling with More
Care" (with discussion), Blackwell, D. (1969), Box, G. E. P. (1992), "Teaching Engineers Experimental Design
with a Paper Helicopter," Box, G. E. P. (1995), "Scientific Statistics - The Way Ahead"
(abstract), in Bradstreet, T. E. (1996), "Teaching Introductory Statistics
Courses So That Nonstatisticians Experience Statistical Reasoning,"
Britz, G., Emerling, D., Hare, L., Hoerl, R., and Shade, J. (1997), "How
to Teach Others to Apply Statistical Thinking," Bryce, G. R. (1992), "Data Driven Experiences in an
Introductory Statistics Course for Engineers Using Student Collected
Data," in Carlson, R. R. (1989), "A Paper Clip Experiment," in Chatterjee, S., Handcock, M. S., and Simonoff, J. S. (1995), Chromiak, W., Hoefler, J., Rossman, A., and Tesman, B. (1992), "A
Multidisciplinary Conversation on the First Course in Statistics,"
in Cobb, G. W. (1987), "Introductory Textbooks: A Framework for
Evaluation," Cobb, G. W. (1992), "Teaching Statistics," in Cobb, G. W. (1993), "Reconsidering Statistics Education: Cobb, G. W. (1997), Cox, D. R. (1998), "Statistics for the Millenium: Some Remarks
on Statistical Education," Cryer, J. D., and Cobb, G. W. (1997), Czitrom, V., and Spagon, P. D. (eds.) (1997), DeGroot, M. H. (1986), Dietz, E. J. (1993), "A Cooperative Learning Activity on
Methods of Selecting a Sample," Doane, D. P., Mathieson, K. D., and Tracy, R. L. (1997), Eltinge, E. M. (1992), "Diagnostic Testing for Introductory
Statistics Courses," in Falk, R., and Konold, C. (1992), "The Psychology of Learning
Probability," in Gal, I., and Garfield, J. B. (1997a), "Curricular Goals and
Assessment Challenges in Statistics Education," in Gal, I., and Garfield, J. B. (eds.) (1997b), Gale, W. A., and Pregibon, D. (1984), "REX: Garfield, J. (1994), "Beyond Testing and Grading: Garfield, J. (1995), "How Students Learn Statistics," Goldman, R. N., McKenzie, J. D., Jr., and Sevin, A. D. (1997), "The
BCASA Conference on Assessment in Statistics Courses," in Gordon, F., and Gordon, S. (eds.) (1992), Gunter, B. (1993), "Through a Funnel Slowly with Ball Bearing
and Insight to Teach Experimental Design," Gunter, B. (1996), "The MISD/MMSTC Statistical DOE Project."
Available at Halvorsen, K. T., and Moore, T. L. (1991), "Motivating,
Monitoring, and Evaluating Student Projects," in Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski,
E. (eds.) (1994), Hansen, J. L. (1980), "Using Physical Demonstrations When
Teaching Data Analysis," in Hawkins, A., Jolliffe, F., and Glickman, L. (1992), Hesterberg, T. C. (1998), "Simulation and Bootstrapping for
Teaching Statistics," to appear in Hoaglin, D. C., and Moore, D. S. (eds.) (1992), Hoerl, R., and Snee, R. (1995), "Redesigning the Introductory Statistics Course (Report No. 130)," Madison, WI: University of Wisconsin Center for Quality and Productivity Improvement. Hogg, R. V. (1990), "Statisticians Gather to Discuss
Statistical Education," Hogg, R. V. (1991), "Statistical Education: Hogg, R. V. (1992), "Towards Lean and Lively Courses in
Statistics," in Hogg, R. V., and Hogg, M. C. (1995), "Continuous Quality
Improvement in Higher Education," Hunter, W. G. (1977), "Some Ideas About Teaching Design of
Experiments, with 2 Iman, R. L. (1994), "The Importance of Undergraduate
Statistics," Iman, R. L., and Conover, W. J. (1983), Iversen, G. R. (1992), "Mathematics and Statistics: Jowett, G. H., and Davies, H. M. (1960), "Practical
Experimentation as a Teaching Method in Statistics" (with
discussion), Macnaughton, D. B. (1986), Macnaughton, D. B. (1996a), "The Entity-Property-Relationship
Approach to Statistics: Macnaughton, D. B. (1996b), "Re: EPR Approach to Intro Stat: Macnaughton, D. B. (1997), "EPR Approach and Scientific
'Explanation' (response to comments by Robert Frick)." Published
in EdStat-L and sci.stat.edu on July 23, 1997. Macnaughton, D. B. (1998a), "Eight Features of an Ideal
Introductory Statistics Course." Available at Macnaughton, D. B. (1998b), "Review of ActivStats 2.0,"
Macnaughton, D. B. (1998c), [responses to comments about the paper "Eight
Features of an Ideal Introductory Statistics Course"]. Macnaughton, D. B. (1998d), "Which Sums of Squares Are Best in
Unbalanced Analysis of Variance?" Available at Magel, R. C. (1996), "Increasing Student Participation in Large
Introductory Statistics Classes," McKenzie, J. D., Jr. (1992), "The Use of Projects in Applied
Statistics Courses," in Moore, D. S. (1992a), "Introduction: Moore, D. S. (1992b), "Teaching Statistics as a Respectable
Subject," in Moore, D. S. (1993), "The Place of Video in New Styles of
Teaching and Learning Statistics," Moore, D. S. (1997a), "New Pedagogy and New Content: Moore, D. S. (1997b), "Bayes for Beginners? Some Pedagogical
Questions," in Moore, D. S. (1997c), "Bayes for Beginners? Some Reasons to
Hesitate" (with discussion), Moore, D. S., Cobb, G. W., Garfield, J., and Meeker, W. Q. (1995), "Statistics
Education Fin de Siècle," Moore, T. L., and Roberts, R. A. (1989), "Statistics at Liberal
Arts Colleges," Mosteller, F. (1988), "Broadening the Scope of Statistics and
Statistical Education," Newton, H. J., and Harvill, J. L. (1997), "StatConcepts: A
Visual Tour of Statistical Ideas," in Noether, G. E. (1992), "An Introductory Statistics Course: The
Nonparametric Way," in Ottaviani, M. G. (ed.) (1996), OzData (1999), "OzData: Australasian Data and Story Library." Parr, W. C., and Smith, M. A. (1998), "Developing Case-Based
Business Statistics Courses," Pearl, D. K., Notz, W. I., and Stasny, E. A. (1996), "Finding
Examples - The EESEE Way Out," in Peck, R., Haugh, L. D., and Goodman, A. (1998), Pollack, S., Fireworker, R., and Borenstein, M. (1995), "Some
Resampling Algorithms for the Testing of Hypotheses," in Roberts, H. V. (1992), "Student-Conducted Projects in
Introductory Statistics Courses," in Rogers, C. B. (1986), "Design of Experiments for QC Circles,"
in Rossman, A. J. (1996), Ruberg, S. J. (1990), "The Statistical Method: Santy, W., and Einwalter, B. (1997), "Comparison of Classroom
Toys for Teaching Experimental Design," SAS Institute Inc. (1999), [SAS Output Delivery System]. For details
search the site at Schau, C., and Mattern, N. (1997a), "Use of Map Techniques in
Teaching Applied Statistics Courses," Schau, C., and Mattern, N. (1997b), "Assessing Students'
Connected Understanding of Statistical Relationships," in Scheaffer, R. L. (1992), "Data, Discernment and Decisions: An
Empirical Approach to Introductory Statistics," in Scheaffer, R. L., Gnanadesikan, M., Watkins, A., and Witmer, J. A.
(1996), Silvers, A., Herrmann, N., Godfrey, K., Roberts, B., and Cerys, D.
(1994), "A Prototype Statistical Advisory System for Biomedical
Researchers," Simon, J. L. (1993), Simon, J. L. (1994), "What Some Puzzling Problems Teach About
the Theory of Simulation and the Use of Resampling," Simon, J. L., and Bruce, P. (1991), Smith, A. F. M., and Gelfand, A. E. (1992), "Bayesian
Statistics Without Tears: A Sampling-Resampling Perspective,"
Snee, R. D. (1993), "What's Missing in Statistical Education?"
Snell, J. L. (1999), "Chance Database." Available at Snell, J. L., and Finn, J. (1992), 'A Course Called "Chance",'
StatLib (1999). This archive of datasets and other statistical
information is available at Sylwester, D. L., and Mee, R. W. (1992), "Student Projects: Tukey, J. W. (1977), Velleman, P. F. (1998), Velleman, P. F., and Hoaglin, D. C. (1992), "Data Analysis,"
in Velleman, P. F., Hutcheson, M. C., Meyer, M. M., and Walker, J. H.
(1996), "DASL, the Velleman, P. F., and Moore, D. S. (1996), "Multimedia for
Teaching Statistics: Promises and Pitfalls," Watkins, A., Burrill, G., Landwehr, J. M., and Scheaffer, R. L.
(1992), "Remedial Statistics?: The Implications for Colleges of
the Changing Secondary School Curriculum," in Wild, C. J. (1995), "Continuous Improvement of Teaching: Willemain, T. R. (1994), "Bootstrap on a Shoestring: Resampling
using Spreadsheets," Willett, J. B., and Singer, J. D. (1992), "Providing a
Statistical 'Model': Teaching Applied Statistics using Real-World
Data," in Wonnacott, T. H. (1992), "More Foolproof Teaching Using
Resampling," in Zahn, D. A. (1992), "Student Projects in a Large Lecture
Introductory Business Statistics Course," in Zahn, D. A. (1994), "A Brain-Friendly First Day of Class,"
in
Donald B. Macnaughton is president of MatStat Research Consulting Inc, 246 Cortleigh Blvd., Toronto Ontario, Canada M5N 1P7. E-mail: donmac@matstat.com Portions of this material were presented at the Joint Statistical Meetings in San Francisco, August 11, 1993, at the Joint Statistical Meetings in Orlando, August 15, 1995, and at the Joint Statistical Meetings in Dallas, August 12, 1998. The author thanks Professor John Flowers of the School of Physical and Health Education, University of Toronto and Professors Donald F. Burrill and Alexander Even of The Ontario Institute for Studies in Education for testing the approach in their statistics courses. Their comments and their students' comments helped substantially to clarify the ideas. The author also acknowledges insightful comments and criticisms from David R. Bellhouse, Carol J. Blumberg, Christopher Chatfield, George W. Cobb, Sir David Cox, W. Edwards Deming, David J. Finney, Robert W. Frick, Joan B. Garfield, Robert V. Hogg, Olaf E. Kraulis, Gudmund R. Iversen, William H. Kruskal, Alexander M. Macnaughton, Christine E. McLaren, David S. Moore, Thomas L. Moore, Jerry Moreno, John A. Nelder, Ingram Olkin, Allan J. Rossman, Richard L. Scheaffer, Milo A. Schield, Stephen Senn, Gary Smith, and Paul F. Velleman.
1. INTRODUCTION * 2. COURSE GOALS * 2.1 The Value of Emphasizing Goals * 2.2 A Definition of "Empirical Research" * 2.3 Topic-Based Goals Have a Serious Drawback * 2.4 Recommended Goals * 3. FIVE CONCEPTS * Concept 1: Entities * Concept 2: Properties of Entities (Variables) * Concept 3: A Goal of Empirical Research: To Predict and Control the Values of Variables * Concept 4: Relationships Between Variables as a Key to Prediction and Control * Concept 5: Statistical Techniques for Studying Relationships Between Variables * Summing Up * 4. EVALUATION OF THE EPR APPROACH * 4.1 The Concepts of the Approach Are Easy to Understand * 4.2 The Approach Provides a Broad and Deep Foundation for Discussion of Statistical Concepts * 4.3 The Approach Unifies the Main Statistical Methods * 4.4 The Approach Links Well With Scientific Explanation * 4.5 The Concepts of the Approach Are Fundamental * 4.6 The Fundamental Concepts Should Be Taught First * 4.7 The Approach Suggests the Role of Statistics * 4.8 The Approach Gives Students a Lasting Appreciation of Statistics * 4.9 The Concepts Are Old But the Approach Is New * 4.10 Compatibility With Other Approaches * 4.11 Responses to Criticisms of the EPR Approach * 5. TESTS OF THE EPR APPROACH * 6. IMPLEMENTING THE EPR APPROACH * 6.1 Motivating Students on the First Day * 6.2 What Topics Should Follow the Five Concepts? * 6.3 An Easy Implementation * 6.4 Implementation in Phases * 6.5 Practical Examples * 6.6 Instantiation and Generalization * 6.7 Feedback Systems * 6.8 The Use of Mathematics in the Introductory Statistics Course * 6.9 Univariate Distributions * 6.10 Implementation With Computer Support * 7. SUMMARY * APPENDIX A: THE PRIORITY OF THE CONCEPT OF 'ENTITY' * APPENDIX B: DO RESEARCH PROJECTS STUDY RELATIONSHIPS? * B.1 Physics or Chemistry Experiments * B.2 Parameter Estimation * B.3 Interval Estimation * B.4 Statistical Tests * APPENDIX C: SUPPORT FOR CLAIMS IN SUBSECTION 4.3 * APPENDIX D: FUTURE SYSTEMS FOR STUDYING RELATIONSHIPS BETWEEN VARIABLES * REFERENCES * Author's Footnote *
(version of February 2, 1999) Home Page for the Entity-Property-Relationship Approach to Introductory Statistics |