**The Introductory Statistics Course:
The Entity-Property-Relationship Approach**

**Donald B. Macnaughton**

- A clickable table of contents of this paper appears at the end.
- This paper is available in PDF format at
http://www.matstat.com/teach/eprt0130.pdf .
This paper proposes six concepts for discussion at the beginning of an introductory statistics course for students who are not majoring in statistics or mathematics. The concepts are (1) entities, (2) properties of entities, (3) variables, (4) a major goal of empirical research: to predict and control the values of variables, (5) relationships between variables as a key to prediction and control, and (6) statistical techniques for studying relationships between variables as a means to accurate prediction and control. After students have learned the six concepts they learn standard statistical topics in terms of the concepts. It is recommended that each concept be taught in a bottom-up fashion with emphasis on concrete practical examples. It is suggested that the approach gives students a lasting appreciation of the vital role of the field of statistics in empirical research. KEY WORDS: Statistics education; Teaching; Role of statistics in empirical research. Two former presidents of the American Statistical Association have stated that "students frequently view statistics as the worst course taken in college" (Hogg 1991, Iman 1994). A third former president has stated that the field of statistics is in a "crisis" and the subject has become "irrelevant to much of scientific enquiry" (Box 1995). The 2001 president has stated that statistics is "still among the most despised of college courses" (Scheaffer 2001). Many statisticians reluctantly agree with these remarks. In contrast, many statisticians agree that the field of statistics is a fundamental tool of the scientific method, which plays a key role in modern society. Thus rather than being a worst course and possibly irrelevant, the introductory statistics course ought to be a friendly introduction to the simplicity, beauty, and truth of the scientific method. Teachers must therefore reshape the introductory course. Many teachers have already contributed to the reshaping, as noted below. This paper proposes further changes. I focus on the introductory statistics course for students who are Section 2 defines the concept of ‘empirical research', which appears throughout this paper. Section 3 recommends two goals for the introductory statistics course. Section 4 proposes six concepts for discussion at the beginning of an introductory course. Section 5 illustrates how the six concepts provide a deep and broad foundation on which we can build the field of statistics. Section 6 discusses testing the proposed approach. Section 7 identifies considerations for teachers wishing to use the approach, and Section 8 gives a summary.
At many points in this paper I refer to "empirical research", which thus deserves a definition:
Empirical research is a crucial step of the scientific method, which is central to many areas of human endeavor, such as in science, education, business, industry, law, and government. Section 5.5 discusses the scientific method. |

Emphasizing the goals of an undertaking helps one to define and focus on
I have observed goal-setting exercises in which the goals were given much
attention for a brief period, but then were forgotten or ignored in the
subsequent months and years -- a waste of a valuable resource. Because the
course goals specify what the teacher believes is most important, I recommend
that teachers regularly revisit their goals to ask (
Many introductory statistics courses have what can be called "topic-based" goals. A teacher using such goals does not specify general goals, but instead simply specifies a list of statistical topics to be covered in the course, perhaps in the form of a syllabus. For example, a teacher of a traditional course might aim to cover (in specified amounts of detail) the topics of probability theory, distribution theory, point and interval estimation, and hypothesis testing. Similarly, a teacher of an activity-based course might make a list of statistical topics and then assign various activities to the students in order to cover the topics. Unfortunately, topic-based goals have a significant drawback: By emphasizing
lower-level statistical
I recommend that the goals of an introductory statistics course for non-statistics-majors be - to give students a lasting appreciation of the vital role of the field of statistics in empirical research and
- to teach students to understand and use some useful statistical methods in empirical research.
I suggest that it is more important to satisfy these goals than to satisfy goals stated in terms of statistical topics. How can we best satisfy these goals? First, it seems clear that students can
appreciate the role of statistics only if they understand it. This leads one to
ask, "What Second, but of equal importance, most students will appreciate the role of
statistics only if they see the Other authors (after Hogg) who discuss goals for the introductory course include Chromiak, Hoefler, Rossman, and Tesman (1992), Cobb (1992, 1993, 2000), Iversen (1992), Watkins, Burrill, Landwehr, and Scheaffer (1992), Hoerl and Snee (1995), Gal and Garfield (1997a, pp. 2 - 5), Moore (1997a), and Garfield (2000). This section describes six simple concepts I recommend for discussion at the beginning of an introductory statistics course for non-statistics-majors. These concepts help students to appreciate the role of statistics by highlighting a recurring simple pattern in the use of statistics across almost all empirical research. Most university and college students can learn the concepts in between one and three class sessions. To make the approach easy for teachers to use, the following six subsections present the six concepts as a condensed version of how they might be presented to students. Teachers using the approach in an introductory course will need to expand the discussion with more examples, as discussed in Sections 7.5 through 7.8. At a few points in this section I discuss pedagogical and statistical issues that are beyond the interest or understanding of most beginning students. These discussions are for teachers and are identified with initial asterisks. I recommend that this material be omitted in an introductory course for non-statistics-majors. I begin with what may be the most fundamental concept of human reality. If you study your train of thought, you will probably agree that you think
about "things". For example, during the next few seconds you may
think about, among other things, a friend, an appointment, today's weather, and
an idea. Each of these things is an example of an Many different types of entities exist. Some common types are - organisms (e.g., people, trees)
- inanimate physical objects
- physical locations
- actions and events
- ideas and emotions
- societal organizations (e.g., governments, schools, businesses)
- entities in science (e.g., waves in physics, motivations in psychology)
- *entities in mathematics (e.g., elements of sets, sets, numbers, functions, vectors).
Clearly, the things in the list are diverse. Thus it may at first seem
absurd to think that they have *Appendix A discusses whether some of the things in the list are Entities are fundamental units of human reality because people unconsciously view everything (every thing) in our reality as being an entity. This dramatically simplifies our thinking because it allows us to view everything (at the most basic level) the same way, as discussed in the next subsection.
The People usually view entities as existing both in the external world and in our minds. We use the entities in our minds mainly to stand for entities in the external world, much as we use a map to stand for its territory. People learn to use the concept of ‘entity' when they are infants. We use the concept unconsciously as a way of organizing the multitude of stimuli that enter our minds from the external world moment by moment while we are awake.
In statistics and empirical research the set of all the entities of a given type is called the "population" of entities of that type. For example, a web site on the Internet is an entity (of an electronic or computer-object sort), and the set of all the web sites on the Internet constitutes the population of web sites.
People usually think of entities unconsciously. However, we sometimes do need to refer to them in general terms. In these situations statisticians and empirical researchers may refer to entities as members of the population, cases, elements, individuals, instances, items, objects, observations, specimens, subjects, things, or (experimental, observational, or survey) units. *I discuss why I recommend the term "entity" for general discussion in a paper (1998a, app. E.1). * If we are dealing with a On the other hand, if we are dealing with the * To bypass the need for students to learn in a top-down fashion, and to ensure that students understand the concept of 'entity', I recommend that teachers develop it using a "bottom-up" sequence of ideas, beginning with familiar concrete examples of entities and working up at an appropriate pace for the students to the general concept. I further discuss the bottom-up approach to teaching statistical concepts in a Usenet post (forthcoming). Every entity has associated with it a set of properties. For example, all people have thousands of different properties, two of which are "height" and "blood group".
(We know or experience a living entity by [in large part] knowing or experiencing its behavior. Researchers who study behavior usually view it as a complicated set of properties of the living entities they study.)
For example, if we wish to know the (value of the) height (property) of a person, we can apply a height-measuring instrument (e.g., a tape measure) to the person, and the instrument will return a value that is (in the specified units) an estimate of the person's height. Measuring instruments (which are sometimes called "measures") are often physical devices such as a tape measure, a speedometer in a car, or litmus paper. But they may also be of other types, such as paper-and-pencil tests administered to students or subjective judgments provided by experts. Measuring instruments are important because all conclusions in empirical research are based directly on the estimates of values of properties obtained from measuring instruments, as discussed in the following subsections.
Viewing all the entities of a given type as having (sharing) exactly the same properties is a key unifying principle of human reality.
A property of an entity may also be called an ability, aspect, attribute, capability, capacity, character, characteristic, countenance, dimension, disposition, facet, factor, faculty, feature, finding, indication, indicator, nature, quality, quantity, scalar, trait, or vector. *Appendix B discusses why I recommend the term "property" for general discussion. *Appendix C discusses the evolution of entities and properties in human thought. * As noted, empirical researchers use measuring instruments to determine estimates of the values of properties of entities. When these estimates are studied formally, statisticians and empirical researchers usually refer to them as "variables". A reasonable definition of the statistical concept of 'variable' is A *Appendix D compares some dictionary definitions of the concept of 'variable'. Appendix E further discusses the distinction between properties and variables.
Clearly, time plays a key role in the idea of the value of a variable: In statistics and empirical research the value of a variable for an entity is generally viewed as an estimate or "snapshot" (possibly with distortion) of the true value of the associated property of the entity at a particular time (or perhaps over a particular time period).
All empirical research projects generate data. The (raw) data from a research project (or from a logical unit of a larger research project) are invariably organized in a table. Each row in the table is associated with one entity of the type under study. Each column is associated with a different property of the entities (or a property of the entities' environment), as reflected in the values of the variable associated with the column. Each cell (intersection of a row and a column) in the table contains the value (at the time of measurement) of the variable associated with the column for the entity associated with the row. The data table (with appropriate footnotes) is the complete record of what was observed in an empirical research project. Thus the table is central to drawing reasonable conclusions from the project (as explained in the next three subsections). The table also provides a succinct summary of the design of the project. Thus when considering or planning an empirical research project it is helpful for students to study the data table, or a manageable number of rows if the table is large. To increase understanding I recommend that studied tables have carefully worded column headings, and that they contain realistic made-up data if real data are unavailable. *Realistic data are further discussed in Section 7.8. * As with entities and properties, students can readily understand the concepts of 'variable' and 'data table' if the ideas are developed in a bottom-up fashion, beginning with concrete examples and working up at an appropriate pace for the students to the general concepts. I recommend that teachers choose variables for discussion that are interesting, easy to understand, and that empirical researchers are seriously interested in studying. For example, automotive engineers are seriously interested in studying the variable "fuel usage per kilometer" in automobiles because appropriate study of this variable enables them to minimize fuel usage and thereby make automobiles less expensive to run. Thus a teacher might show students a data table with different types of automobiles representing the rows and relevant automobile variables (including "fuel usage per kilometer") representing the columns. If the components of such a table are carefully discussed, students attain a concrete sense of the entities, properties, and variables associated with the table.
A central idea in the definition of empirical research is that researchers "draw conclusions from data". Why do researchers wish to do this -- what are the goals of empirical research?
We seek the ability to predict and control the values of variables because it provides many social and commercial benefits. For example, if a medical researcher can discover how to better predict or control people's risk of heart attacks, this discovery provides the social benefit of saving lives. Similarly, if an organization can discover how to better control variables that reflect important properties of its operations (e.g., customer satisfaction, product performance, product usefulness, product reliability), this discovery helps the organization to optimize its operations and thereby become more successful. Since the ability to predict and control the values of variables is of broad usefulness, many branches of society (in science, business, technology, education, and government) provide substantial support to empirical research aimed at learning how to predict or control the values of key properties or variables. * - A main test of whether any explanation or understanding is valid is whether it leads to accurate prediction or control.
- An important application of any (correct) explanation is that it can be used for accurate prediction or control. After conceptually subtracting this application from explanation, one looks in vain for other applications, save supporting development of further explanations (which can themselves be used for prediction or control).
- When given a choice, most people will readily choose accurate prediction or control of some important phenomenon (without explanation or understanding) over accurate explanation or understanding of the same phenomenon (without prediction or control).
I support these points in two Usenet posts (1996a, 1997b). See also Section 5.5 below. *
We predict and control the values of variables in the entities in the
population of entities under study. We seek the ability to predict and control
the values of variables in entities in For example, in medical research the population of entities of interest is often all the people in the world. The goal of the research is to find ways to predict or control the values of important medical variables in any person in the population (ideally including all people living, dead, and unborn). Similarly, in organizational research the population of entities under study might be all the weeks in the life of a particular organization. Here the goal of the research might be to find ways to predict and control the values of important organizational variables in any week (especially later weeks) in the life of the organization. Thus if we include the concept of 'population', we can say A fundamental goal of empirical research is to discover how to
We can predict and control the values of variables by studying In a "relationship between variables" one variable (called the
For example, medical researchers have discovered that a relationship exists between the amount of saturated fat ingested by a person (predictor variable) and the risk that a person will have a heart attack (response variable). The relationship is that more saturated fat is associated with a higher risk of a heart attack. Knowing this relationship helps doctors and patients to predict and control heart attacks. (Empirical research about the relationship between saturated fat and heart attacks is summarized by Kromhout 1999, Liebson and Amsterdam 1999, and de Lorgeril and Salen 2000.) In addition to using the concept of 'dependence' to characterize relationships between variables, we can characterize them as follows: A relationship exists between two variables if we find that when the values of the predictor variable(s) "go up and down" in the entities under study (or in the entities' environment), the values of the response variable also go up and down (or down and up) somewhat "in step" with the values of the predictor variable(s). For many non-statistics-majors the above two informal characterizations of relationships are sufficient if they are properly illustrated with practical examples. For more advanced students I propose a formal definition of the concept of 'relationship between variables' in a paper for students (1997a, sec. 7.10) and I discuss an important alternative definition in a Usenet post (2002).
For example, the medical researchers who discovered the relationship between saturated fat consumption and heart attacks did so by studying data tables for samples of people. These tables have one or more variables for each person that reflect the person's heart attacks and also one or more variables that reflect the person's fat consumption. The researchers used statistical procedures to look for a relationship between fat consumption and heart attacks in the tables, and such a relationship has been reliably found. These findings (and other supporting information) lead doctors to believe that the relationship exists in all the people in the world.
- the net amount of force applied to a physical object and the rate of acceleration of a physical object (physics)
- the weight of a car and the fuel usage of a car (automotive engineering)
- the concentration of alcohol in the bloodstream of a person driving a car and the probability that a person driving a car will be involved in an accident (physiology)
- the number of hours of homework completed by a student during a course and the grade obtained by a student in a course (educational psychology)
- the education of a person and the income of a person (sociology)
- the style of management of a company and the amount of profit of a company (business)
- having a black cat cross one's path and having bad luck (folk beliefs).
Each of these examples identifies a possible relationship between variables. Each of these relationships (and relationships between any other pairs or larger sets of compatible variables) can be studied in an empirical research project. If the research project finds conclusive evidence of the relationship, we can use the knowledge of the relationship to predict and possibly control the values of the response variable in new entities from the population on the basis of their values of the predictor variable(s).
- What is the
*population*of entities under study? - How many entities are in the
*sample,*and is the sample selected from the population in a way that allows us to correctly generalize from the sample to the population? (For economy, compromises to "ideal" samples are often reasonable, but should be kept firmly in mind when drawing conclusions.) - What is the
*response variable*that is measured in each entity in the sample? (*Some research projects have multiple response variables, but usually each response variable can be reasonably viewed as defining a separate research project, with the research projects sharing the same entities and predictor variable[s]. See also Appendix I.2.) - What is (are) the
*predictor variable(s)*that is (are) measured in each entity in the sample (or measured in each entity's environment)? Which predictor variables (if any) are manipulated by the researcher and which are merely observed? - In plain language or in graphical terms (as opposed to mathematical terms),
what is the
*relationship*between the response variable and the predictor variable(s) that is sought, discovered, or studied in the research project? - Regardless of whether the research project has been performed, suppose it
has been performed and a relationship between the response variable and the
predictor variable(s) has been found. Can you think of a
*reasonable alternative explanation*for the (apparent) relationship between the variables? (Such an explanation renders the research finding equivocal, and thus of less value. To avoid deceiving themselves, empirical researchers strive to perform research in ways that eliminate all reasonable alternative explanations.) - How can we best use the relationship to
*predict*or*control*the values of the response variable in new entities from the population? - How
*accurate*will the prediction or control be? - What are the
*practical implications*of the findings?
By considering sufficient practical examples, students recognize that most empirical (including most scientific) research projects can be reasonably understood by considering them in terms of the nine questions. Thus students recognize that most empirical research projects can be reasonably viewed as studies of relationships between variables in entities in samples, with the aim being to develop the ability to accurately predict or control the values of the property associated with the response variable in new situations for any entity in the population. *Appendix H discusses some possible counterexamples to the points in the preceding paragraph. Section 5.5 discusses the relationship between relationships between variables and some general concepts of science. Mosteller (1990) and Lipsey (1990) discuss the idea of a reasonable alternative explanation. I briefly discuss some history of the concept of 'relationship between variables' in a Usenet post (2001).
Similarly, several general terms are available to name the response variable and the predictor variable(s) in a relationship between variables. For example, a response variable may be called a "predicted" variable or a "dependent" variable, and a predictor variable may be called an "explanatory" variable or an "independent" variable. *Appendix G discusses why I recommend the terms "response" and "predictor" for general discussion. *
Once students properly understand and appreciate the usefulness of
relationships between variables as a means to prediction and control, we can
then bring the field of statistics out onto the stage. We can introduce the
Statistics is a set of optimal general techniques to help empirical researchers study variables and relationships between variables in entities in samples, mainly as a means to accurately predict and control the values of variables (properties) in entities in populations. After developing this idea, we can spend the rest of the course and subsequent courses discussing standard statistical principles and methods in terms of it. This approach enables us to unify most discussion in statistics under the concepts of entities, properties, variables, and relationship between variables. Sections 5.3 through 5.9 further discuss this unification. The preceding subsections propose six concepts for discussion at the beginning of an introductory statistics course for non-statistics-majors. The concepts are - entities
- properties of entities
- variables (which are formal representations of properties of entities)
- an important goal of empirical research: to predict and control the values of variables
- relationships between variables as a key to prediction and control and
- statistical techniques for studying relationships between variables in empirical research as a means to accurate prediction and control.
After introducing the six concepts, the teacher spends the rest of the
course covering statistical techniques for studying relationships between
variables. The course is thus Depending on the level of the students, my experience suggests that the six concepts can be properly introduced in between one and eight class sessions. (As noted above, most university and college students can learn the high-level concepts in between one and three class sessions.) Study of the details of the sixth concept can last a lifetime. Teachers can ensure that students understand the concepts by developing them through bottom-up sequences of ideas, beginning with familiar concrete examples of each concept and working up at an appropriate pace for the students to the general concept. I call the approach to the introductory statistics course described above the "entity-property-relationship" (EPR) approach. Section 5 discusses evaluating the EPR approach, Section 6 discusses testing it, and Section 7 discusses implementing it.
This section presents material to help readers evaluate the entity-property-relationship approach to the introductory statistics course.
The EPR approach differs from other approaches to the introductory course in the following ways: - The EPR approach unifies many ideas of statistics and empirical research under the goal of accurate prediction and control of the values of variables. Many other approaches do not directly emphasize this unifying goal.
- The EPR approach focuses on using information about relationships between variables (as gleaned from empirical research) as a key to accurate prediction and control. Many other approaches do not emphasize the concept of 'relationship between variables' as a unifying statistical concept. Sections 5.2 through 5.9 further discuss the unifying aspects of the EPR approach.
- The EPR approach introduces the concepts of 'entity', 'property of an entity', and 'variable' at the beginning of the course before discussing relationships between variables. Most other approaches omit (or at least minimize) discussion of these important preliminary concepts. Section 5.10 gives further discussion.
- Although the EPR approach places heavy reliance on data analysis, the
*concept*of 'data analysis' is not emphasized. Many other approaches emphasize the concept of 'data analysis'. Section 5.12 gives further discussion. - The EPR approach uses examples of empirical research projects that are practical. Some other approaches pay less attention to ensuring that examples are practical. Section 7.5 gives further discussion.
- The EPR approach emphasizes the role of statistics in empirical research, as discussed in Section 4.6. Some other approaches pay less attention to the role of statistics and place more emphasis on the mathematical aspects of statistics. Section 7.9 gives further discussion.
- The EPR approach emphasizes the use of hypothesis testing as a means to detecting relationships between variables. Some other approaches suggest that hypothesis testing should be de-emphasized. Section 7.10 gives further discussion.
- The EPR approach omits discussion of univariate distributions at the beginning of the course. Many other approaches spend a substantial amount of time discussing univariate distributions at the beginning. Section 7.11 gives further discussion.
Despite the above differences, the EPR approach is consistent with and thus compatible with most other approaches to the introductory statistics course -- the differences above are merely differences in ordering and emphasis of statistical topics. Section 5.14 illustrates the relationship between the EPR approach and several other popular approaches.
Sections 4.1 and 4.2 imply that the concepts of 'entity' and 'property' pervade students' unconscious thought. Therefore, if we carefully bring these concepts into students' consciousness (through sufficient practical examples), students find the concepts easy to understand. Similarly, Sections 4.3 through 4.5 suggest that if we carefully develop the concepts of 'variable' and 'relationship between variables' for students with practical examples, these concepts are also easy for students to understand. The ease of understanding leads me to conjecture that the concepts of entities, properties, variables, and relationships can be taught at all levels of teaching statistics from late elementary school up, with only the teaching time and depth of coverage of the concepts varying at different levels.
Section 4.3 introduces the fundamental statistical concept of 'variable' in terms of the concepts of 'entity' and 'property'. Section 4.5 introduces the fundamental statistical concept of 'relationship between properties' (relationship between variables), which is clearly also built atop the concepts of 'entity' and 'property'. The concepts of 'entity', 'property', and 'relationship' can be used as a foundation for other statistical concepts. Here is a sequence of definitions that develop some basic statistical concepts from the three concepts: *number:*a fundamental type of entity in everyday life and mathematics that is learned in childhood through instances of numbers in counting and through fractions; numbers are generally used to represent simple counts of entities and to represent the values of properties of entities*set:*a fundamental type of entity in human reality that consists of zero or more entities of a specified type (i.e., with specified properties)*population:*the complete set of entities of a specified type (see also Appendix A.4)*sample:*a set of entities selected from a population*event:*a fundamental type of entity in human reality that generally involves a particular physical location and a particular continuous period of the property "time" and that has other defining properties that make it of interest*probability:*a particular property of an event that reflects how often (in time or in some other dimension) the event occurs or is thought likely to occur*variation:*changing in the values of a property across entities in a population or sample (changing either between entities or within entities over time)*distribution:*the (often unknown) set of the values of a property (or properties) in the entities in a population or sample*distribution function:*a mathematical entity that mimics in probabilistic terms the distribution of the values of a property (properties) in the entities in a population or sample*model (equation):*a mathematical statement of a relationship between properties of entities*parameter:*a general name for a property of a population, distribution, or model*estimation:*the process of assigning a value to a property or parameter, often including an estimate of the accuracy of the value*hypothesis:*a statement whose truth is unknown about (*a*) the existence of an entity or type of entity, (*b*) the existence of a relationship between properties of entities, or (*c*) the value of a parameter*statistical test:*a technique for providing an objective measure of the weight of a body of evidence in support of a hypothesis*statistic:*any of various well-defined properties of a sample whose values are obtained by performing mathematical operations on the values of one or more properties of the entities in the sample; often used to estimate the values of parameters or in performing statistical tests.
The definitions cover many of the main statistical concepts. Each definition is built atop the concepts of 'entity', 'property', or 'relationship', or is built atop concepts that are themselves built atop the three concepts. Furthermore, the concepts of entities, properties, and relationships appear to be among the most fundamental concepts of human reality. Thus the EPR approach provides a deep and broad foundation for statistical concepts.
Statistical methods can perform the following four groups of techniques to help empirical researchers study relationships between variables: - techniques for
*detecting*relationships between variables - techniques for
*illustrating*relationships between variables - techniques for
*predicting*and*controlling*the values of variables on the basis of relationships between variables, and - miscellaneous techniques for the study of variables and relationships between or among variables.
These techniques are of substantial help in answering important questions 5, 7, and 8 in Section 4.5. I discuss these techniques further in the paper for students (1997a, secs 8-13). (Logically, the four groups of techniques seem best listed in the above order. However, pedagogically, in an introductory statistics course it makes sense to discuss simple techniques for illustrating relationships before discussing techniques for detecting relationships.) The four groups of techniques raise the question: Which of the currently available statistical methods can actually perform these techniques? The following twenty-one statistical methods can perform one or more of the four groups of techniques: - general linear model (
*t*-test, analysis of variance, linear regression, multiple comparison methods, hierarchical methods, variance components analysis, multivariate analysis of variance, multivariate linear regression) - generalized linear model
- response surface methods
- exploratory data analysis
- time series analysis
- survey analysis
- survival analysis
- categorical analysis
- graphical methods
- meta-analysis
- Bayesian methods
- nonlinear regression
- neural networks and statistical learning theory
- discriminant analysis
- nonparametric methods
- probit analysis
- logistic regression
- correlation analysis
- structural and path analysis
- data mining methods
- univariate analysis.
Upon consideration, many statisticians will agree that the above list of
twenty-one statistical methods contains almost all of the currently popular
methods, including what most statisticians would view as the "main"
methods. Many statisticians will also agree that the Since the list of twenty-one statistical methods contains almost all of the currently popular methods (including the main methods), and since each method in the list is fully explained (at a high level) by the four groups of statistical techniques that are emphasized in the EPR approach, therefore the approach unifies statistical methods. That is, the EPR approach allows us to teach each new statistical method in terms of the same set of simple concepts: entities, properties, variables, and relationships between variables. Emphasizing the simple commonalities that exist among the methods makes the field of statistics substantially easier for students to understand.
The EPR approach has strong links with three important general concepts of science, as follows:
It is reasonable to view the scientific method as consisting of four steps: - A researcher frames (i.e., invents) a new hypothesis about some area of experience. (The researcher frames the hypothesis on the basis of knowledge of earlier research and current theories, often together with intuition, logic, or mathematical modeling. If the hypothesis is multi-faceted or fundamental, it may be called a "theory".)
- A researcher (perhaps the same researcher) then deduces or infers an empirically testable implication of the hypothesis.
- A researcher (perhaps the same) then performs an empirical research project to test whether the implication is actually present in the area of experience.
- If reliable evidence of the implication is found (and in the absence of a reasonable alternative explanation), the community with prime interest in the area of experience will (by informal consensus) accept (or will be more inclined to accept) the hypothesis framed in step 1 as being correct.
The scientific method is central to science because almost all modern
scientific research (and most other empirical research) proceeds Interestingly, Examination of instances of the use of the scientific method suggests that the implication in step 2 can usually be usefully viewed as a statement of a relationship between variables in some population of entities. This can be seen by applying the nine questions discussed in Section 4.5 to specific research projects that exemplify the method -- the questions almost always reasonably apply. Thus the key concept of the EPR approach of 'relationship between variables' plays a central (though often implicit) role in the scientific method. Appendix H discusses some possible counterexamples to the point in the preceding paragraph. I further discuss the scientific method in a Usenet post (2001, app. A).
- statements of the existence of entities
- statements of the existence of properties of entities
- statements of the values of properties
- statements of the distributions of the values of properties
- statements of relationships between entities
- statements of relationships between properties (relationships between variables)
- statements that link together the statements of the six other types.
Because the seven types of statements are all quite general, and because experience suggests that many (all?) other scientific explanations can be given in terms of (at most) the seven types of statements, it appears that most (all?) scientific explanations consist of merely (at most) the seven types of statements. We can view scientific "understanding" as taking place in an individual person. A person has understanding of some state of affairs or phenomenon if they have learned to think and speak in terms of the "correct" explanation of it. The seven types of statements of a scientific explanation are all important, but the sixth type (about relationships between variables) is perhaps the most important. This is because statements of relationships directly enable accurate prediction and control. Thus the key concept of the EPR approach of 'relationship between variables' plays a central role in scientific explanation and understanding.
As discussed in Section 4.5 and Appendix H, most empirical research projects can be usefully viewed as studying relationships between variables. Thus by focusing on the concept of 'relationship between variables' the EPR approach unifies most empirical research.
A commercial product is an entity, as are instances of a product, as is a commodity, as is a financial instrument (e.g., a stock or a bond), as is a loan repayment or dividend, and as is an interaction with a customer. These and all other entities that are used in commerce are efficiently handled by the EPR approach. To achieve a general understanding of the logical constructs used in commerce it is helpful to study how commercial organizations store information. Almost all progressive commercial organizations use a computer "database" as their main repository for information. This is because databases have easy-to-use, versatile, reliable, and secure features that allow one to easily assemble information to generate reports, invoices, charts, and other graphical, statistical, and textual information as a broad and fundamental aid to operating an organization. A database consists of a set of one or more "tables". Each table holds information about entities of a particular type. For example, a manufacturing company might have one database table that holds information about its products, another that holds information about its customers, another that holds information about its invoices, and so on. The database (or databases) of a larger progressive organization may contain hundreds (or even thousands) of tables holding information about all the types of entities in which the organization has a serious interest. A database table is conceptually identical to a statistical data table, as described in Section 4.3 -- a rectangular array that contains one or more "rows" associated with the entities the table is tracking and one or more "columns" associated with properties of the entities. Each cell in the table contains the value of the property associated with the column for the entity associated with the row. The database of a progressive commercial organization will hold a substantial proportion of the organization's information because even "documents" (which are entities) can be stored in a database table to facilitate ready access. (A cell in a modern database table can hold an entire document.) As noted, database tables are the main repositories for information in commerce, and the rows in a database table are associated with entities and the columns are associated with properties of the entities. Thus the basic concepts of the EPR approach of 'entity' and 'property' play fundamental (implicit) roles throughout commerce. (Data mining is reasonably viewed as the study of relationships between variables reflected in the columns of database tables.)
Section 4.1 notes that nouns are used in language to denote entities and Section 4.2 notes that adjectives and adverbs are often used in language to denote the values of properties of entities. Since language is intimately tied to human thought, it is of interest to consider how the concepts of the EPR approach relate to other parts of speech. A verb usually express one of the following ideas: - an act or action by an entity or entities
- an occurrence (i.e., an event)
- a state or mode of being of an entity or entities (possibly in terms of the value or values of one or more properties)
- a relationships between entities (including a relationship between properties and a relationship between variables).
When one views entities broadly, the acts, actions, occurrences, events, states, modes, values, properties, variables, and relationships in the list are themselves all entities. (Appendix A further discusses this idea.) Furthermore, most sentences that contain verbs also contain one or more nouns. (All verbs in coherent sentences have a subject, perhaps implicit, which is represented by a noun. Transitive verbs have an object, perhaps implicit, which is also represented by a noun.) The verbs describe various "things" about the entities denoted by the nouns including things about the properties of the entities. Thus when verbs are used in language, entities are invariably present and of central interest. (Verbs are also often linked to the concept of 'time'. We can view time [both duration and point in time] as a property of the entity that contains all other entities -- the entity we call "experience" or "reality". Alternatively, we can view time as a property of events.) The remaining parts of speech function as or support nouns, verbs, adjectives, and adverbs as follows: - Articles ("a", "the") are a form of adjective -- they modify a common noun to indicate whether it is being used to denote an entity in a general sense or in a particular sense.
- Prepositions delineate phrases that function as adjectives or adverbs. The choice of a preposition denotes some general aspect (e.g., "in", "of", "beside") of the relationship between the word or phrase (i.e., noun, verb, adjective, or adverb) being modified by the prepositional phrase and the entity named by the noun that is the object of the phrase.
- Conjunctions (e.g., "and", "but", "if", "because") join together words, phrases, or clauses to make more complicated phrases, clauses, sentences, or pairs of sentences. The choice of conjunction indicates the relationship between the ideas (which are a type of entity) it joins.
- Interjections express simple complete thoughts -- often expressions of emotion. Both emotions and expressions of emotion are reasonably viewed as being entities, as discussed in Appendix A.
Thus the concepts of the EPR approach link well with the various parts of speech. Thus the approach links well with language at a fundamental level.
Subsections 5.2 through 5.8 suggest that the concepts of entities, properties, variables, and relationships between variables - are easy to understand
- are a foundation for many of the fundamental statistical concepts
- unify statistical methods
- link well with general concepts of science
- unify empirical research
- link well with general concepts of commerce and
- link well with language at a fundamental level.
These points together with consideration of the fundamental statistical concepts discussed in Section 5.3 suggest that the concepts of the EPR approach are more fundamental than many (all?) of the other concepts that are traditionally discussed in statistics courses.
As suggested in Section 5.2, the concepts of 'entity' and 'property' and (to a lesser degree) 'variable' and 'relationship' are easy to understand. Furthermore, these simple concepts are (by virtue of their logical priority) substantially easier to understand than the various traditional statistical concepts that depend on them. In view of the ease of understanding, and in view of the logical priority, it is reasonable to carefully cover the concepts of 'entity', 'property', 'variable', and 'relationship' first, before introducing even the most rudimentary of the other traditional statistical topics. As suggested in Sections 5.3 through 5.9, this unifies and simplifies discussion of the traditional topics.
To address this issue, let us consider the concept of 'variable'. This concept is arguably the most ubiquitous concept in statistics. Clearly, students must understand the concept of 'variable' before they can understand the concept of 'relationship between variables'. How do students usually learn this concept? Students usually first learn the concept of 'variable' in their first
algebra class in grade 6 or later. The introductory algebra teacher usually
does During the rest of the introductory algebra course, and during subsequent
mathematics courses, students will encounter many other Consider some differences between the use of the concept of 'variable' in mathematics and the use in statistics: - A variable in a mathematical problem generally has only a
*single*value. (This value is often abstract -- known only in the sense of an algebraic symbol, such as*x*-- but the value is generally viewed as a single value.) On the other hand, a variable in a statistical problem usually has*multiple*values -- typically one value for each entity in the sample (and population) under study. - In mathematics the values of important variables are often
*unknown*. Often the goal of a mathematical analysis is to solve for or prove a theorem about the unknown value(s). On the other hand, in the typical application of statistics to a real-world problem, the values of the relevant variables are (for the sample, after the data collection) generally all*known.*Generally the goal of a statistical analysis is to enable drawing conclusions (predictions) about the unknown values of the response variable for new entities in the population on the basis of the known values of the predictor variable(s) for these entities and on the basis of the known values of the response and predictor variables for the entities in the sample.
Perhaps due to the above differences between the mathematical and statistical concepts of 'variable', and perhaps due to the mathematical (algebraic) genesis of the concept of 'variable' in students' minds, many non-statistics-majors have difficulty understanding the fundamental statistical concept of 'variable'. This can be seen by asking students to define the concept -- many students have difficulty giving a reasonable definition. Some students may say that a statistical variable is a "measurement of something", which (although vague) is certainly correct. But they are often unable to say, without prompting, what the "something" is -- both in the specific sense of voluntarily identifying the relevant entities in a given situation and in the general sense of linking the concept of 'variable' to the more fundamental concept of 'property of an entity'. The preceding five paragraphs suggest that many students entering the
introductory statistics course lack a clear understanding of the statistical
concept of 'variable'. But examination of currently popular textbooks for the
introductory course suggests that most approaches assume entering students
(Some books do introduce entities, properties, and variables, but spend only a page or two on these topics at the beginning and then never return to focused discussion of them. This approach forgoes the substantial unifying power of the concepts. For example, using the same concepts but different terminology Moore briefly discusses "individuals", "characteristics", and "variables" at the beginning of two of his introductory texts [1997b, 2000]. I discuss Moore's use of the concepts in a Usenet post [1997c].) Sections 4.1 and 4.2 above suggest that all students unconsciously learn the
concepts of 'entity' and 'property' (the Therefore, it is useful to spend time at the beginning of an introductory statistics course discussing the fundamental concepts of 'entity' and 'property'.
Section 3.3 recommends that the first goal of an introductory statistics course be to give students a lasting appreciation of the vital role of the field of statistics in empirical research. Does the EPR approach satisfy this goal? Consider: - The EPR approach is aimed specifically at satisfying a main goal of empirical research -- the goal of accurate prediction and control.
- Most students are directly interested in prediction and control (of variables of interest to them).
- The idea of prediction and control on the basis of relationships between variables is easy to understand.
- The EPR approach directly shows how the field of statistics can play a broad role across almost all empirical research.
- The EPR approach is developed in a logical sequence from intuitive fundamental concepts (with numerous practical examples).
These points suggest that the EPR approach gives students a lasting appreciation of the field of statistics and its vital role in empirical research.
Many approaches to introductory statistics emphasize the concepts of 'data' and 'data analysis'. One can see this by noting the frequent occurrence of the word "data" in the preface and early chapters of many textbooks and other discussions. In contrast, the EPR approach does not emphasize the concepts of 'data' or 'data analysis' and instead emphasizes the concept of 'relationship between variables as a means to accurate prediction and control'. The EPR approach links well with the concepts of 'data' and 'data analysis'. This is because the exact operation of what is called "data analysis" is an essential step of the EPR approach. Data analysis is the step in which we actually study the relevant data to look for information about relationships between the variables. Tukey initiated emphasis on the concept of 'data analysis' in statistics
education with his seminal book Because emphasis on mathematical statistics is now greatly diminished in
introductory statistics courses for non-statistics-majors, and because these
courses now generally focus on analyzing data, it is useful to ask whether
Instead of emphasizing data and data analysis, the EPR approach sharpens the
focus by emphasizing the A teacher can show students the link between relationships between variables on the one hand and data analysis on the other by noting that relationships between variables are generally studied in terms of relationships between (the values of the variables in) the columns of data in a data table.
The concepts of entities, properties, and relationships are not new. Indeed, all statisticians and empirical researchers use these concepts implicitly throughout their thinking and discussion. However, as discussed in Section 5.10, the fundamental concepts of 'entity', 'property', 'variable', and 'relationship' are almost never carefully discussed in a unified approach in introductory statistics courses. I believe that the unfortunate omission of unified discussion of these concepts is the main reason why the field of statistics is so widely misunderstood. (Some leaders in statistics education have already independently adopted an important aspect of the EPR approach in that they emphasize relationships between variables in their introductory courses. For example, using an idea developed by Gudmund Iversen, George Cobb teaches two introductory courses, both of which start with relationships -- one devoted to experimental design and applied analysis of variance and the other devoted to applied regression [G. Cobb, personal communication, August 21, 1996]. Similarly, Robin Lock teaches an introductory course devoted to time series analysis -- i.e., methods for studying relationships between variables when an important predictor variable is "time" [Cobb 1993, sec. 3.1].)
Many helpful new approaches to teaching the introductory statistics course
have recently been proposed. As suggested by Moore (1997a), these approaches
fall neatly into two distinct groups: Each of the - emphasis on data analysis and de-emphasis of theoretical concepts (especially de-emphasis of probability theory, distribution theory, and the theory of statistical tests) (Tukey 1977; Cobb 1992; Moore 1992a, 1992b)
- emphasis on exploratory data analysis and de-emphasis of confirmatory data analysis (Tukey 1977; Velleman and Hoaglin 1992)
- emphasis on statistical reasoning and de-emphasis of statistical methods (computations) (Ruberg 1990; Bradstreet 1996)
- emphasis on the design and analysis of experiments (Bisgaard 1991)
- emphasis on the Bayesian approach to statistics (usually restricted to more mathematically literate students) (Blackwell 1969; DeGroot 1986; Albert 1996, 1997; Berry 1996, 1997; Berry and Lindgren 1996; Antelman 1997; Moore 1997c, 1997d)
- emphasis on (
*a*) process (which is a particular type of entity) and (*b*) the minimization of the variation in selected properties of a process, mainly through relationships between variables, sometimes with less emphasis on formal statistical methods (Snee 1993; Hoerl and Snee 1995; Britz, Emerling, Hare, Hoerl, and Shade 1997, Hoerl and Snee 2002) - emphasis on nonparametric statistical methods (Iman and Conover 1983; Noether 1992)
- emphasis on resampling (Boomsma and Molenaar 1991; Simon and Bruce 1991; Smith and Gelfand 1992; Wonnacott 1992; Albert 1993; Simon 1993, 1994; Willemain 1994; Pollack, Fireworker, and Borenstein 1995; Hesterberg 1998)
- emphasis on aspects of a practicing statistician's work that are not directly related to drawing conclusions from data, such as team design of research projects, ensuring data quality, and managing data (Higgins 1999)
- emphasis on probability (Falk and Konold 1992).
In contrast to the conceptual approaches, the - more interaction between the teacher and the students instead of straight lectures (Mosteller 1988; Zahn 1994)
- application of the principles of Total Quality Management or Continuous Quality Improvement to the design and management of the course (Hogg and Hogg 1995; Wild 1995, Hogg 1999)
- use of multimedia, film, or video, to teach the concepts (Moore 1993; Ottaviani 1996; Velleman and Moore 1996; Cobb 1997; Cryer and Cobb 1997; Doane, Mathieson, and Tracy 1997; Newton and Harvill 1997; Macnaughton 1998b; Velleman 1998)
- use of demonstrations, activities, or projects to teach the concepts (Jowett and Davies 1960; Scott 1976, Hunter 1977; Hansen 1980; Carlson 1989; Bisgaard 1991; Halvorsen and Moore 1991; Bryce 1992; McKenzie 1992; Roberts 1992; Sylwester and Mee 1992; Zahn 1992; Gunter 1993, 1996; Fillebrown 1994; Garfield 1994, 1995; Mackisack 1994; Sevin 1995; Magel 1996; Rossman 1996; Scheaffer, Gnanadesikan, Watkins, and Witmer 1996; Holcomb and Ruffer 2000)
- use of students working in groups instead of working individually (Dietz 1993; Garfield 1993, 1995)
- replacement of lectures by assigned readings, which are motivated by individual and group "Readiness Assessment Tests" (Michaelsen 1999, Simon, Harkness, Buchanan, Chow, Heckard, Lane, and Zimmaro 2000, Teaching Effectiveness Program 2000)
- emphasis on examples that are clearly practical as opposed to examples that have no obvious practical value (see Section 7.5 below)
- use of real or realistic data in examples (see Section 7.8 below)
- use of cases (i.e., detailed examples or problems based on practical situations) to teach the concepts (Chatterjee, Handcock, and Simonoff 1995; Czitrom and Spagon 1997; Parr and Smith 1998; Peck, Haugh, and Goodman 1998; Nolan and Speed 1999)
- use of news stories to teach the concepts (Snell and Finn 1992; Snell 1999)
- use of improved methods for assessing students (Eltinge 1992; Cobb 1993; Garfield 1994; Chance 1997; Gal and Garfield 1997b, 1999; Goldman, McKenzie, and Sevin 1997; Garfield 2000)
- use of concept maps to teach the concepts (Schau and Mattern 1997a, 1997b)
- use of community service projects to teach the concepts (Anderson and Sungur 1999)
- use of sports statistics to teach the concepts
- use of a computerized library of examples that students can analyze to aid in teaching the concepts (Andrews and Herzberg 1985; Hand, Daly, Lunn, McConway, and Ostrowski 1994; Velleman, Hutcheson, Meyer, and Walker 1996; Pearl, Notz, and Stasny 1996; OzData 1999; StatLib 1999, Rumsey 2001)
- emphasis on improving students' writing ability (Radke-Sharpe 1991; Samsa and Oddone 1994; Stromberg and Ramanathan 1996; Holcomb and Ruffer 2000)
- use of computers or calculators to generate data displays, to illustrate and simulate statistical ideas, and to perform statistical analyses (Dixon and Massey 1983, Macnaughton 1998a, sec. 8).
I discuss some criteria for evaluating pedagogical approaches in a paper (1998a, sec. 7). Most introductory statistics teachers now use some combination of the above conceptual and pedagogical approaches. The main disagreement among teachers is only about the relative emphasis that each approach deserves. (It is possible to classify the use of multimedia, film, video, computers, and calculators as "technological" approaches to the introductory course, rather than as "pedagogical" approaches. However, it seems more reasonable to view technology as a means to better pedagogy rather than as an end in itself.) A simple relationship exists between the EPR conceptual approach to the introductory statistics course and the other approaches -- the EPR approach can be effectively used in conjunction with any (or any group) of them. Moore (1997a) reviews several of the new approaches to statistics education. Cox (1998) comments on some general aspects of statistics education. Gordon and Gordon (1992), Hoaglin and Moore (1992), and T. Moore (2000) give papers by leading statistics educators about teaching statistics. Hawkins, Jolliffe, and Glickman (1992) give a general discussion of teaching statistical concepts.
The EPR approach has been criticized as being too "abstract" for students to understand. I discuss this important criticism in a Usenet post (forthcoming). I discuss some other insightful criticisms of the approach in a series of Usenet posts (1996-2001). It is interesting that statisticians, who are the keepers of the keys to empirical (scientific) research, perform almost no serious empirical research in statistics education. Instead, much of what is reported as "testing" of approaches in statistics education is anecdotal. That is, the author or proponents of a new approach use the approach one or more times in courses and then report that the approach was successful. Unfortunately, no matter how "successful" a course might appear to be, anecdotal reports do not reflect valid empirical research about the approach used in the course. This is because reasonable alternative explanations invariably exist that could explain why the course was as successful as it was. Some possible reasons why a course might be successful are - The teacher may be a very good teacher, and the success of the course may
merely reflect the teacher's skill and may
*not*reflect anything notable about the approach used in the course. - The lessons, assignments, tests, and examinations in the course may have
been easy (relative to other courses) and the success of the course may merely
reflect the easy course, and may
*not*reflect anything notable about the approach used in the course. - The teacher may have invested a substantial amount of time and effort in the approach, which may dispose him or her to see the approach from an overly positive point of view.
These alternative explanations (and other situation-specific alternative explanations) imply that anecdotal testing of approaches to teaching statistics is invariably equivocal. We can eliminate the equivocation of anecdotal testing by testing approaches with randomized experiments. Such experiments (when properly performed) provide clear comparative evidence of the effectiveness of different approaches to teaching statistics. Some readers may feel that experimentation in statistics education is not possible because too many confounding variables are present. For example, "instructor teaching ability" must be properly accounted for before unequivocal conclusions can be drawn. However, confounding variables can usually be accounted for in experimental research, albeit at the expense of increased cost and complexity. Furthermore, accounting for confounding variables in experimental research in statistics education would appear to be no more complicated than accounting for them in multicenter clinical trials, where accounting for confounding variables is standard practice. Some readers may feel it would be difficult to ensure protocol adherence by the multiple statistics teachers that are needed in a proper experimental trial of different approaches to statistics education. Clearly, this is a challenging problem, although perhaps no more difficult than ensuring protocol adherence in multicenter clinical trials, where various monitoring systems are used to ensure adherence. Some readers may feel that experimentation in statistics education may be ineffective because no reasonable response variable can be found that is sensitive enough to discriminate between different treatments. However, this is an empirical question that awaits serious attempts to address it. I further discuss methods and problems of experimentally testing approaches to the introductory statistics course, including a recommendation that one use "attitude toward statistics" as a response variable, in a paper (1998a, app. A and B).
Despite my preceding comments, I regret that I cannot report proper experimental testing of the EPR approach -- such testing is beyond my resources. Although I cannot report proper testing of the EPR approach, I can report some enthusiastic remarks from three teachers who used a draft textbook for the approach (Macnaughton 1986) in their courses. They commented that ... students found the book enjoyable and easy to understand. Using a unique approach, Macnaughton has provided a comprehensive first-rate introduction to the material. I would highly recommend the book for use in introductory statistics courses ....
... students obtained a good understanding of the basic principles of statistical analysis. ... [the approach] substantially simplifies the material without sacrificing important concepts. - The absence of overt mathematics enables the underlying principles of scientific research ... to be more directly apprehended by persons who have ... weak grounding in mathematics. ... ... Students' comments have been uniformly favorable .... ... the book is to be commended to the instructor.
These remarks are encouraging, but are far from being definitive about the effectiveness of the EPR approach. I hope that publication of this paper will facilitate proper testing of the approach.
Until textbooks based on the EPR approach are available, a teacher wishing to use the approach in an introductory course can use the paper for students (1997a) to reinforce class discussion of the six introductory concepts. The following twelve subsections discuss implementation considerations.
The first day of class is important because if the lesson is properly designed, it will establish a positive attitude about the course in students' minds. What should be the first statistical idea we introduce to students on the first day of class? I recommend that the first idea be the promise that the course will teach students how to make accurate predictions. For example, we can promise students they will learn how to accurately (but generally not perfectly) predict - the mark they will get on the final
- their average annual income over the next several years
- their longevity
- whether it will rain tomorrow
- just about anything else of interest (if it can be reliably measured).
(Along with prediction methods, I recommend that the introductory course devote substantial attention to the methods of exercising accurate control through formal experimentation. However, for simplicity, I recommend that discussion of control and experimentation be omitted at the very beginning -- the promise of accurate predictions seems quite enough to engage students. Section 7.3 further discusses experimentation.) If we promise students on the first day of class that they will learn how to make accurate predictions, we arouse their curiosity and set the stage for development of the six concepts discussed in Section 4. The promise also sets a practical tone for the course, which is more likely to impress most students than if we begin with mathematical discussion. If we promise students on the first day of class that they will learn how to
make accurate predictions, we must later fulfill our promise. In particular,
the thoughtful student will be interested in whether we can demonstrate
Section 4.6 recommends that after introducing the six concepts of the EPR approach the teacher spend the rest of the course expanding the sixth concept by covering standard statistical topics. The present and the next subsections discuss ways of covering the standard topics. As a general principle for choosing topics, I recommend that teachers cover topics that are used more frequently in empirical research first. One easy way to implement the EPR approach is to follow discussion of the six concepts with material selected from an already existing introductory statistics course. The teacher can use the six concepts to introduce and unify the material. This enables the teacher to use the EPR approach in an already existing course with only a minimum amount of modification to the course. A more unified way of implementing the EPR approach is to break the course into five phases: an introductory phase, a practical-experience phase, a generalization phase, a specific-methods phase (optional), and a mathematics phase (also optional).
I recommend that the teacher begin the practical-experience phase with discussion of a commonly occurring simple type research project -- the observational research project that studies the relationship between two continuous variables (2000 Hayden response, red tab 6). Possibly using the material in the paper for students (1997a) as an introduction, the teacher can discuss how to design an observational research project to study the relationship between two continuous variables, how to use a scatterplot to illustrate such a relationship, how to use statistical techniques to analyze data from such a research project to determine if a relationship is present between the variables, and how to use the model equation derived from such a relationship to make predictions. To reinforce the discussion, I recommend that students be given computer assignments to detect and study (practical) relationships between pairs of continuous variables in various sets of data. If time permits, the bivariate case can be extended to the multiple regression case. Next, the teacher can discuss "experiments" and the associated
statistical methods as a powerful tool for studying If time permits, the fully randomized one-way case can be extended to the multi-way case, repeated measurements, blocking, analysis of covariance, and so on. The length of the practical-experience phase should be adjusted to allow enough time at the end of the course to properly cover the material in the important next phase.
*Types of Variables.*This topic introduces students to a standard typology standard typology for variables, with four mutually exclusive and exhaustive categories: continuous, discrete-ordinal, discrete-nominal, and binary. This topic also introduces the idea of the univariate distribution of the values of a variable, with coverage of ways of illustrating the distribution of the values of a variable and measures of the center and spread of a distribution. (This topic is sometimes taught at the beginning of an introductory statistics course -- see Section 7.11.)*Overview of Statistical Methods.*This topic provides a*brief*high-level overview of statistical methods (such as those listed in Section 5.4 and Appendix I.2), with*brief*discussion of the conditions under which each method is applicable. Most of the twenty-one methods listed in Section 5.4 are best summarized in a table, with rows of the table indexing the four possible types of the response variable and with columns indexing the four possible types of the predictor variable(s) in the research. Each cell in the body of the table contains the names of the statistical methods that are typically used to study relationships between variables when the response variable and predictor variables are of the indicated types. (As emphasized by Velleman and Wilkinson [1993], who discuss another typology for variables, valid exceptions to the categorization can occur. Nevertheless, the table of methods is useful because it categorizes the*standard*approaches to the study of relationships between variables and thereby provides a unifying overview of many standard statistical methods.) I recommend that students*not*be required to memorize the table, but rather it be provided as a key to understanding and choosing statistical methods.*Underlying Assumptions of Statistical Methods.*This topic discusses how every statistical method is based on certain assumptions (which are different for different methods), and why therefore (to avoid later possible embarrassment) a researcher should always verify that the underlying assumptions are adequately satisfied before attempting to draw conclusions from the use of a statistical method. I recommend that teachers characterize the common assumptions using dot plots or graphs that illustrate situations in which assumptions are both satisfied and not. But, except when students are mathematically mature, I recommend that teachers*not*discuss the mathematical details of the assumptions -- the important point is the*existence*of the assumptions, not their complicated mathematical details, which we can leave to the computer and later courses.*Statistical Thinking.*This topic gives students practice in understanding, criticizing, and writing reports of empirical research, generally involving detailed consideration of the nine questions introduced in Section 4.5, and generally observing the principles of the scientific method, as summarized in Section 5.5.
A reasonable ending-point for an introductory statistics course is at the end of the generalization phase.
For each method in Section 5.4, I recommend that the following topics be covered (when applicable): *Design:*How to design a research project to study relationships between variables using the method.*Power:*How to use statistical software to compute the power of statistical tests for detecting relationships between variables using the method.*Data Checking:*How to use statistical software to examine the data from a research project using the method in order to identify and (when appropriate) correct anomalies in the data prior to analysis (using methods for studying univariate and possibly bivariate distributions of the values of variables).*Assumptions:*How to analyze the description and results of a research project in order to determine whether the underlying assumptions of the method are sufficiently satisfied to permit drawing conclusions from the output of the method.*Detection:*How to use statistical software to analyze the results of a research project in order to detect relationships between variables using the method.*Illustration:*How to use statistical software to illustrate relationships between variables using the method.*Prediction and Control:*How to use statistical software to analyze the results of a research project in order to derive model equations for relationships between variables using the method, as an aid to prediction or control.*Accuracy:*How to determine the accuracy of the prediction or control of a model equation.*Reporting:*How to interpret and write reports of empirical research projects that use the method.
Except for statistics or mathematics majors, I recommend that the use of mathematics be avoided in the specific-methods phase. Instead, I recommend that attention be focused on designing research projects and on correctly interpreting the relevant output from statistical software.
In choosing topics for the introductory statistics course it is helpful to
distinguish between a basis for action and a decision procedure. A On the other hand, a - a decision in some substantive area (e.g., in secondary school practices,
in medical care, or in business) whether to
*take*the action suggested by a basis for action, which I call an "action" decision - a decision whether a relationship exists between given variables, which is discussed further in Section 7.10
- a decision about the likely value of a population parameter (possibly a parameter in a model)
- a decision about the next step to take in conducting an empirical research project.
The second through fourth types of decision play important roles in statistics, but I focus here on the first. Procedures for making action decisions are studied in a branch of statistics called "decision theory", which was founded by Wald (1950). Such procedures must take account of many diverse inputs. Some typical inputs are one or more bases for action (i.e., relationships between variables), various social or commercial values (or goals, perhaps stated as objective or utility functions), alternative explanations, error sizes, costs, and side-effects. A procedure for making an action decision takes appropriate account of all these inputs and provides as output an "optimal" recommendation whether (and possibly how) to perform various actions. In view of the multiple diverse inputs, useful procedures for making action decisions are much more complex than relationships between variables. Perhaps due in part to this complexity, most action decisions are still made (in all areas) on the basis of informal and intuitive criteria rather than by formal decision procedures. Swets, Dawes, and Monahan (2000) and Edwards and Fasolo (2001) discuss some current work in decision procedures. (A formal procedure for making action decisions can be efficiently characterized as a set of relationships between variables in which the response variables are indicators of whether [or how] actions should be taken and the predictor variables reflect the various inputs to the decisions. Decision procedures for assisting with the second through fourth types of decisions above can be similarly characterized.) The question arises whether the introductory statistics course should
discuss procedures for making action decisions. Because such procedures are
complicated and infrequently used, I recommend that the introductory course
omit this topic. And although the formal procedures for making optimal action
decisions are an interesting and important area of study, it seems more in
keeping with the standard use of statistics in empirical research to focus the
introductory course on the study of relationships between variables. A
relationship can suggest a Barnett discusses the distinction between statistical inference (which can often be reasonably viewed as inference about relationships between variables) and decision procedures (1982). Bordley (2001) discusses teaching decision theory in applied statistics courses. As noted at several points above, and following Jowett and Davies (1960), Scott (1976), Hunter (1977, pp. 16-17), Cobb (1987, sec. 4.2), and Willett and Singer (1992, p. 91), I recommend that any implementation of the EPR approach discuss each main concept in terms of numerous practical examples. Practical examples can appear in lectures, textbooks, multimedia courseware, class discussions, exercises, activities, and projects.
Does an understanding of the relationship between variables in the example have an obvious potential social, scientific, or commercial benefit? That is, does knowledge of the relationship suggest some clear basis for action? If we choose examples that suggest a clear basis for action, and if we ensure that students see the practical benefits provided in the examples, we help students to appreciate the practical value of statistics.
Examples that fail to satisfy the practicality criterion seriously detract from the field of statistics because they associate the field with problems that appear to be frivolous (or at best inconsequential). For example, a research project that studies the relationship between people's forearm lengths and their foot lengths is a "frivolous" research project because students can see no obvious practical use of knowledge of this relationship. Study of frivolous research projects trivializes the field of statistics. (Interestingly, if one looks hard enough, one can find practical uses of most relationships between variables. For example, the relationship between forearm length and foot length is occasionally used in orthopedics, paleontology, and physical anthropology. If a particular group of students is likely to be impressed by an obscure example, this is a reasonable example for them. But if a complicated explanation is needed before students can see the practicality of an example, most students are unimpressed.) If the students in a particular introductory statistics course are all specializing in the same discipline, and if that discipline performs empirical research, we can almost certainly make the greatest impression on these students by discussing examples from among the milestone empirical research projects in the discipline. We can also impress students if we discuss practical examples of research projects that use response variables that students themselves are directly interested in predicting and controlling, such as variables reflecting student grades, student health, student skills, student happiness, student expenses, and student income.
On the other hand, some introductory textbook writers provide an abundance of excellent practical examples (e.g., Moore [2000], Freedman, Pisani, and Purves [1998]). (The frequent use of impractical examples in some statistics textbooks is
one reason why some writers insist that teachers and textbook writers use
I further discuss the use of practical examples in the introductory statistics course in a paper (1998a, sec. 6) and in a Usenet post (forthcoming).
Once students have studied a concept through a sufficient number of practical examples, I recommend that the teacher cement the appropriate generalizations about the concept in students' minds. This helps students to use the concept in new situations. For example, once students understand the concept of 'relationship between variables', the teacher can make the generalization that most empirical research projects can be usefully viewed as studying relationships between variables. After stating a generalization, I recommend that the teacher assign exercises in which students identify details of the generalization in specific new instances. In particular, after discussing the idea that most empirical research projects can be usefully viewed as studying relationships between variables, I recommend that the teacher assign exercises in which students answer the nine questions given in Section 4.5 for various empirical research projects, including research projects of the students' own choosing. Answering these questions shows students that the questions almost always usefully apply. Appendix H supports the point that the questions almost always usefully apply. Appendix I.2 discusses some infrequently occurring types of empirical research projects that lack a response variable. How many explanations, examples, exercises, or activities should a teacher provide or assign to ensure that students understand a particular generalization? This depends, of course, on the generalization and on the nature of the students and is often difficult to determine at the front line of teaching -- especially if a teacher is using a new approach. To reduce this difficulty, I recommend that teachers use feedback systems to assess whether students understand each main concept and generalization. Some effective feedback systems for assessing students' understanding are - minute papers (which are ungraded and may be anonymous) in which students briefly report their understanding, their questions, and the muddiest point to help the teacher evaluate a lecture or lesson (Mosteller 1988)
- graded quizzes, exercises, and essay questions that focus on testing students' understanding of the ideas
- two-way discussions between the teacher and students about the ideas.
Garfield (2000) discusses approaches to assessing students as an aid to improving their learning and understanding. Gal and Garfield (1997b, pt. 2) give four interesting essays by statistics educators about assessing students' understanding of statistical ideas.
Some statistics educators recommend that the introductory statistics course
rely heavily on - Good realistic data are much easier to obtain than good real data.
- If realistic data are properly described and properly supported by ancillary material, students either cannot tell or do not appreciate the difference between the data they are studying and real data.
- Insistence on real data may lead teachers to use impractical examples because (despite the available Internet data libraries) appropriate practical real data are often hard to find.
In view of these points, I recommend the following criteria for data in examples (including exercises) in an introductory statistics course: - Use real data in examples if good real data are available. But if a better example can be made by adapting real data or by completely making up the data, it is quite reasonable to use realistic made-up data.
- Regardless of whether an example uses realistic or real data, the teacher should strive to make the example practical, as discussed in Section 7.5.
After students have finished planning a research project, I recommend (following Hunter 1977) that the teacher provide them with appropriate realistic made-up data that the students might have obtained if they had actually performed their project. The students can then analyze these data and report the results. Providing students with realistic made-up data enables them to proceed in sequence through research planning and research analysis using an example that is of direct interest to them, even though (for reasons of cost, accessibility, or time) performing such a research project in real life would generally be impossible for students. Also, providing students with made-up data ensures that students obtain interesting results because in generating the data the teacher can tailor the results to contain interesting features. I recommend that students be required to write a "codebook" describing the expected data table for their planned research. This aids student understanding and aids the teacher in generating the data. I recommend that teachers provide students with data with relationships
between variables that are only moderately significant, that is, with
In addition to the above approach, I recommend that students be assigned a small number of activities or projects at the beginning in which they actually conduct empirical research. However, in later work, I recommend that this time-consuming physical activity (which is generally non-statistical) be omitted through the use of realistic made-up data, thereby providing students with more statistical experience per unit of time spent on the course.
As suggested in Section 7.3, except in courses aimed at statistics or
mathematics majors, I recommend that discussion of the underlying mathematics
of statistics (e.g., probability theory, distribution theory, theory of
statistical models, theory of statistical tests) be omitted from the
introductory statistics course. This recommendation is motivated by the needs
of the typical user of statistical methods. This person is interested in the
field of statistics only to the extent that it can help them to detect and
study relationships between variables (or possibly perform equivalent functions
under another name). And like the typical automobile driver who needs
transportation, but who cares little about the mechanical details of the
engine, the typical user of statistical methods needs help studying
relationships between variables, but cares little about the mathematical
details of the help. Instead, the user's attention is directed toward a
substantive area of empirical research (e.g., toward a particular branch of
medicine). Thus the less we engage (and confuse) potential users with the
complicated mathematical details of statistical methods, and the more we teach
them how to properly The paper for students (1997a) illustrates one approach to showing due deference to the underlying mathematics without becoming immersed in complicated details. Moore (1997a, sec. 4) and the American Statistical Association (2000) also recommend de-emphasizing mathematics in statistics education. Section 5.4 introduces four groups of techniques that statistical methods
can perform to help empirical researchers study relationships between
variables. The first group of techniques are techniques for Hypothesis testing is theoretically broader than testing for evidence of the existence of relationships between variables. However, examination of the use of statistics in empirical research suggests that most practical instances of hypothesis testing can be usefully viewed as testing for evidence of the existence of a relationship between variables (or testing for evidence of the existence of an extension to an already known relationship between variables). That is, the instances can be viewed as testing the hypothesis that a relationship exists between a response variable and one or more predictor variables. Appendix H.4 expands this point. Cobb (1992, 2000) notes that some statistics teachers feel that (general) hypothesis testing should not be taught or should be de-emphasized in the introductory course. This view reflects the controversy that presently exists about the use of hypothesis testing (sometimes called "inference" or "significance testing") in empirical research (Wilkinson and Task Force on Statistical Inference 1999). A central idea of hypothesis testing about relationships between variables
is that we can (tentatively) conclude that a relationship exists if the
relevant Testing hypotheses about the existence of relationships between variables is
important because (to avoid embarrassing and costly errors) we must first
verify that we have proper evidence that a relationship exists between the
relevant variables before we attempt to use information about a putative
relationship for prediction or control. Testing hypotheses about the existence
of relationships between variables by computing relevant (Tukey has suggested that a relationship likely exists between any given
response variable and In some empirical research projects -- especially in the physical sciences -- the evidence of the existence of a relationship between variables is so strong that the use of hypothesis testing, although not incorrect, is superfluous. However, data are generally noisy. If we are analyzing noisy data, hypothesis testing provides an objective procedure to help us decide whether we have reasonable evidence of the existence of a relationship. Also, hypothesis testing can reliably detect subtle phenomena in data that may otherwise go unnoticed. The rationale behind hypothesis testing is not easy for beginning students to understand. However, hypothesis testing will likely remain a pillar of empirical research because no easier general way to objectively check for evidence of the existence of a relationship between variables seems possible. (Another approach to objectively checking for evidence of the existence of a relationship is to use confidence intervals for parameters. However, this approach is harder to understand than the hypothesis-testing approach because the confidence-interval approach requires that students understand statistical model equations and the distributions of the parameters of these models. The hypothesis-testing approach allows us to keep these technical matters behind the scenes, focusing instead on the clear-cut simple criterion discussed in the fourth paragraph of this subsection.) To facilitate student understanding of hypothesis testing, I recommend that
teachers emphasize the idea that the In teaching students about hypothesis testing it is important to distinguish
between statistical significance and practical significance, which are mutually
independent. A relationship between variables is I illustrate an approach to discussing hypothesis testing in the introductory course in the paper for students (1997a, sec. 9) and I discuss these ideas further in a paper (1998a, sec. 5). Traditionally, the introductory statistics teacher spends a substantial amount of time near the beginning of the course covering univariate distributions of the values of variables. (The coverage generally includes ways of summarizing and illustrating univariate distributions and may also include the mathematics of univariate distributions.) Because many statistical concepts depend on the idea of a univariate distribution, it is clearly mandatory to cover this topic at some point in students' statistical careers -- but where? Except possibly in courses for statistics majors, I recommend that
discussion of univariate distributions be Nor is the topic of univariate distributions If we omit univariate distributions at the (The concepts of univariate distributions are especially helpful in the
mathematical derivation of Appendix H.2 discusses how univariate distributions are rigorously a degenerate case of relationships between variables. I further discuss univariate distributions in a paper (1998a, sec. 9.1 and app. G) and in some Usenet posts (1998c).
Appendix J discusses future software systems for studying relationships between variables. Such systems will make it substantially easier for teachers to convey statistical concepts to students. Under the entity-property-relationship approach, students learn the following six concepts at the beginning of an introductory statistics course: - entities
- properties of entities
- variables
- an important goal of empirical research: to predict and control the values of variables
- relationships between variables as a key to prediction and control
- statistical techniques for studying relationships between variables as a means to accurate prediction and control.
To facilitate understanding, students learn the concepts in terms of numerous practical examples. After students have learned the six concepts, they learn standard statistical principles and methods in terms of the concepts, again with emphasis on practical examples. The EPR approach is broad, and the concepts of the approach are fundamental. The approach gives students a lasting appreciation the vital role of the field of statistics in empirical research.
Section 4.1 notes that people use nouns in language to denote entities. This implies that certain unusual things that we might not initially think of as entities are, under the EPR approach, entities. For example, the words "event", "behavior", "emotion", "set", "statistical distribution", "property", "color", "trial", and "flying saucer" are all nouns. Therefore, these words (when used in specific situations) all denote entities. Is it reasonable to view the "things" denoted by these nouns as entities? To address this question, note that all instances of these "things" have properties. For example, all events have the properties of "location", "duration", and "identities of participants". Similarly, all behaviors have the properties "social appropriateness" and "duration". Similarly, all emotions in people have the properties "type", "intensity", and "duration". Similarly, all sets have the properties "type of elements" and "number of elements". Similarly, all statistical distributions have the properties "type" (i.e., continuous or discrete) and "expected value". Similarly, all properties have the properties "type" (i.e., continuous or discrete), "distribution" (e.g., normal or binomial), and "expected value". Similarly, all colors have the properties of "intensity of the yellow light component of the overall color" and "saturation". Since these things all have properties, and since the only things we know with properties are things or entities, these things are all reasonably viewed as being entities. An important type of entity studied in some empirical research projects are "trials" (or "cases", "runs", or "instances"). For example, in a simple physics experiment we may operate a piece of apparatus for several "trials" under some condition A and we may also operate it for several trials under some other condition B. These trials typically become the main entities in the data analysis because each row in the statistical data table of the results is associated with a different trial. Trials have the properties of "duration", "time of occurrence", and "outcome", and thus are reasonably viewed as being entities. Let us define a "flying saucer" as a type of vehicle used by extraterrestrials. All flying saucers have the properties of "size" and "does it exist?". In the case of the "does it exist?" property of flying saucers, many people believe that the value of this property is probably always "no". But the fact that the value of the property may be "no" does not remove entity-hood from flying saucers -- something need not exist in the external world for it to be a valid entity. This approach frees us to think about things that may or may not exist, which is a useful freedom. Similarly, for any other noun we can find properties of the thing or things denoted by the noun. (For even if a thing denoted by a noun has no other properties, it still has the property of having no other properties.) Since all things denoted by nouns have properties, it makes sense to view whatever a noun names or identifies (in a given context) as being an entity of some type. (Section 4.2 suggests that a behavior is a property of a living entity, but the first two paragraphs of this appendix suggest that a behavior is an entity. This dual view of behavior applies to other properties as well: As noted above, since all properties [and all variables] are denoted by nouns, all properties [and variables] can be viewed as entities. For example, the word "height", which names a property of physical objects, is a noun. Thus the word "height" names an entity -- an entity that is a property. Thus the root concept is the concept of 'entity' -- everything in human reality is an entity. Thus even properties [of entities] are entities [with properties]. However, I do not recommend discussing this confusing philosophical issue in an introductory statistics course.)
I say that all "things" denoted by nouns can be viewed as
entities. But one can reasonably ask whether some of these things that I am
A problem with appealing to an authority to define what are entities and properties is that we have no ready access to a reliable authority who will indicate whether a given candidate for entity-hood or property-hood is a valid entity or property. (We could have people, perhaps experts, vote on these matters, but this would be time-consuming and perhaps without much benefit.) Because we lack access to an appropriate authority, we generally fall back on simply bestowing entity-hood on any "thing" that is of interest. That is, most people unconsciously view a thing as being an entity as soon as "it" attracts our attention. Similarly, once we have a thing of interest, we may recognize properties of it. For example, most people unconsciously view a flock of birds as an entity with properties. If the birds suddenly scatter, the flock is no more. But while the birds are together the concept of 'flock' is a useful concept to enable us to view the birds as a group and to act as a container for the various properties of the group, such as "number of birds in the group", "cohesiveness of the group", and "age of the group". Whether the flock and its properties are "real" would appear to be irrelevant. (It is relieving that many entities in the external world are not as ephemeral as a flock of birds, with many entities existing continuously throughout one's lifetime.) An alternative approach to using the concept of 'entity' is to use an entity-less fog of unattached properties. However, this approach seems much less viable, and perhaps impossible because it is generally necessary to link together individual values of different variables in an analysis (as reflected, say, in the linkages between the values within a row in a data table). It is the concept of 'entity' that does the linking. Since we cannot easily abandon the concept of 'entity', and since several other important concepts can be built atop the concept, a reasonable approach to describing human reality is to begin with the concept of 'entity' as a primitive. Another approach is to begin with the concept of 'property' as a primitive and then to define an entity as a cohesive cluster of properties together with their values. However, the property-first approach has the logical problem that it presupposes the concept of 'entity' -- clusters (i.e., sets), properties, and values are all entities.
This paper views populations and samples as containing entities. In
contrast, some discussions of mathematical statistics view populations and
samples as containing only Viewing a population as containing only variables or values reflects the mathematical approach of abstracting only the variables and their values from the situation under study, ignoring the entities and their properties. This approach makes the material substantially more general -- it is designed to be correct (i.e., logically consistent) regardless of the identity of its referents. Therefore, viewing a population as containing only variables or values is clearly a useful approach in mathematical statistics. However, viewing a population as containing only variables or values diverges from standard human thinking in which we organize the external world into populations of entities that have properties that have values, rather than organizing it directly into populations of variables or values without (at least tacit) acknowledgement of the entities and properties to which the variables and values are "attached". Therefore, although the abstract mathematical approach is essential for generality in discussions of mathematical statistics, it is reasonable to introduce the ideas to beginners in terms that reflect standard human thinking. Thus in the introductory statistics course for non-statistics-majors it is reasonable to view populations and samples as containing entities. Thus under this approach, statistical data contain estimates (at the time of measurement) of the values of properties of the entities that are in a sample that was drawn from the population.
Section 4.2 introduces the concept of 'property' and notes that several alternative terms are available to name this concept. Which term is preferred? A reasonable approach to answering this question is to choose the term that (in the relevant sense) students are most familiar with. Breland and Jenkins report the frequency of occurrence of common English terms in the literature typically encountered by high school and university students (1997). Table B.1, which is extracted from their report, shows the estimated frequency of occurrence of terms that we might use to name the concept of 'property'. Table B.1
SOURCE: Unfortunately, the table is not definitive because several of the terms in the table have more than one sense and these different senses were not distinguished in generating the word frequency counts (which were done by computer). For example, the term "nature" can denote the concept of 'property' discussed in this paper, but it can also denote the "natural" world and the events that take place in it. Similarly, the term "property" can denote the concept of 'property' discussed in this paper, but it can also denote "something owned or possessed". Similarly, some of the other terms (e.g., "character", "quality", and "attribute") have more than one sense. (Also, the table is less definitive because Breland and Jenkins did not estimate the size of the sampling error of their estimates. However, since their sample contained 14 million words [from texts that students frequently study], intuition suggests that 95% confidence intervals for the frequency estimates in the table will be less (perhaps substantially less) than 20 frequency units wide. See also the discussion by Carroll, Davies, and Richman of error estimates in a statistical model for word frequencies [1971, pp. xxxiii-xxxiv].) Although we cannot use the table to determine the frequency of occurrence of the terms in just the sense under discussion, we can use the table together with a notion of the popularity of the terms in their different senses to develop a feeling for the appropriateness of the various terms. The table indicates that the term "nature" is by far the most frequently occurring of the terms. However, when this term occurs the writer or speaker is usually referring to some form of "mother nature" rather than referring to a nature (i.e., property) of some entity. To avoid ambiguity with the popular alternate sense, this suggests that the term "nature" is ruled out from being the chosen term to name the concept of 'property'. Similarly, the term "character" seems at least as popular in its other senses as it is in the sense of 'property'. Thus use of the term "character" would likely cause ambiguity for beginners, which suggests that this term is also ruled out from being the chosen term to name the concept of 'property'. The term "quality" is a strong contender. However, it has the
disadvantage that it may connote judgment or evaluation and the sense of
"superiority" or "excellence". Also, when I read Section
4.2 with the string "propert" replaced everywhere with the string
"qualit", the section seems less effective. (It seems more natural to
speak of the height The table suggests that the term "attribute" occurs (in its multiple senses) less than one-tenth of the times that the term "property" occurs (in its multiple senses). Thus if we wish to use a familiar term, this suggests that the term "attribute" is ruled out. Thus for me the term "property" works best to name the concept of 'property'.
The genesis of the unconscious use of the concepts of 'entity' and 'property' in human thought seems straightforward: The ability to discriminate and conceptually manipulate entities and properties has an evolutionary advantage, and thus the concepts and ability evolved through natural selection. Thus presumably the ability to discriminate the values of properties began as an ability in a simple organism to discriminate and act on differences in a single simple property of its external world (perhaps temperature or light intensity). That ability gave the organism a selective advantage. That ability has evolved into our current human ability to react to and improvise in the external world in many complicated ways as a means to improving our survival and comfort. (We act at various levels of maturity ranging from empathetic philanthropy down to unconscious greed.) The reactions and improvisations are based on our goals or needs coupled with mental models of the external world in terms of entities, properties of entities, relationships between entities, and relationships between properties. The preceding paragraph suggests that properties preceded entities in the evolution of human thought. That is, natural selection led organisms first to (unconsciously) develop the concept of 'property'. Following (or partly overlapping) the development of the concept of 'property', higher animals developed the ability to (unconsciously) recognize that several associated properties could be "attached" to similar things or entities, which enabled the concept of 'entity' to function as a key simplifying concept for understanding the external world. Entities also provide the necessary conceptual framework for study of relationships between properties (which occur across entities) -- relationships that directly assist us in prediction and control. Because presumably properties preceded entities in the evolution of human
thought, and because presumably the infant recognizes properties in its
external world at an earlier age than when it recognizes entities, it is
reasonable to view properties as being
Section 4.3 proposes a definition of the term "variable". This appendix first discusses whether we can define two important concepts that are used in that definition and then discusses and compares some other possible definitions of the term "variable".
The definition of the term "variable" given in Section 4.3 uses
the terms "entity" (actually "entities") and
"property". Thus one might reasonably ask for definitions of these
more fundamental terms. However, not every term in a body of knowledge can be
verbally defined without introducing undesirable circularity. Thus the terms
"entity" and "property" are
Several different definitions are available for the term "variable" and confusion among these (closely related) definitions can easily arise. We can say that a variable is - a formal representation of a property of entities.
- a symbol representing a property of entities.
- a placeholder for a value of a property, in effect a pronoun (Rubin 1996).
- a set of values of a property in a population or sample.
- a concept in statistical software that represents a column in a data table, which contains the measured values of some property for members of a set of entities.
- a symbol that stands for some entity (where the concept of 'entity' includes properties, variables, and values).
Consider some dictionary and encyclopedia definitions of the concept of 'variable'. A variable is - a feature that is subject to systematic and haphazard variation (Cox 1999).
- a : a quantity that may assume any one of a set of values b : a symbol representing a variable (Merriam-Webster 1994).
- something that may or does vary; a variable feature or factor (Random House 1993).
- a. a quantity or function that may assume any given value or set of values. b. a symbol that represents this (Random House 1993).
- a quantity or force which, throughout a mathematical calculation or investigation, is assumed to vary or be capable of varying in value (Oxford University Press 1971).
- generally any quantity which varies. More precisely, a variable in the mathematical sense, i.e. a quantity which may take any one of a specified set of values. ... It is useful, but far from being the general practice, to distinguish between a variable as so defined and a random variable or variate (Marriott 1990).
- any finding (an attribute or characteristic) that can change, that can
*vary,*or that can be expressed as more than one value or in*various*values or categories (Vogt 1993). - some characteristic that differs from subject to subject or from time to time (Everitt 1998).
- a symbol which is used to represent some undermined element from a given set, usually the domain of a function (Parker 1994).
Note how definitions 7 through 14 define the term "variable" in terms of the words "feature", "quantity", "factor", "finding", "attribute", and "characteristic". As suggested in Appendix B, these words are all (in the relevant one of their senses) near synonyms for the word "property". Definitions 8, 10, 11, and 12 define the term "variable" in terms of the word "quantity" which suggests that the values of a variable must be numeric. However, this approach is too limiting because variables in statistics and empirical research sometimes have non-numeric values. Definitions 10 and 15 use the concept of 'function'. My sense is that this mathematical concept (a mapping between two sets) is not needed to define "variable". Definition 11 uses the concept of 'force'. Although it might be viewed more broadly, this concept is generally viewed as merely a particular type of variable from physics, which does not seem to deserve special mention.
As noted, most of the above definitions appeal to the concept of 'property' under one name or another, as discussed in Appendix B. Some of the definitions also refer to the concept of 'representation' or 'symbol'. Thus, to be consistent with common usage, I believe the preferred definition of the term "variable" must refer to the concepts of 'property' and 'representation'. None of the dictionary or encyclopedia definitions use the word "property". I suspect that this word was avoided in the definitions because it leads to the question "property of what?", and lexicographers felt that entities or "things" are not always present when variables are present. But although it is true that entities are often not present when variables are used in mathematics, it would appear to be difficult or impossible to find a variable that is used in empirical research that cannot be reasonably viewed as reflecting some property of some type of entity. Since variables in empirical research are invariably associated with entities, and in view of the fundamental role of statistics in empirical research, and in view of the fundamental role of the concept of 'entity' in human thought, I believe the concept of 'entity' should be present in the statistical definition of the concept of 'variable'. These ideas lead to definitions 1 and 2. To choose between definitions 1 and 2 we must decide whether to view a variable as a "formal representation" or as a "symbol". I prefer the phrase "formal representation" because variables have several notions associated with them (such as 'distribution', and 'relationship between variables'), and these notions seem consistent with the concept of 'formal representation'. On the other hand, the concept of 'symbol' connotes for me the idea of an empty container, thereby diminishing the sense of the associated notions. This leads me to recommend definition 1. Definition 1 omits four concepts that are closely associated with variables: - the concept of the 'value' of a variable
- the concept of 'variation' in the values of a variable
- the concept of 'measurement' of the values of a variable
- the concept of the time of the measurement of the value of the variable.
For simplicity, instead of including these concepts in the definition of "variable", I recommend that they be introduced immediately after the definition as concepts that are associated with the definition. This approach is illustrated in Section 4.3. This approach is consistent with the approach taken with the concept of 'property' in Section 4.2 in which the concept is first introduced and then the other four concepts are introduced. I further discuss the concept of 'variable' in a Usenet post (1996b) and in the paper for students (1997a).
The discussion in Section 4.3 raises the following important question: Are the concepts of 'property of an entity' and 'variable' merely interchangeable synonymous concepts? I believe the two concepts are best viewed as being not synonymous for the following reasons: - It is useful to speak of the difference between the value of a variable and the true value of the associated property of an entity. This difference is called the "error" in the value of the variable (Pearson 1902, Cochran 1968). If we merge the concepts of 'property' and 'variable', it makes less sense to speak of this difference.
- Children learn the concept of 'property' and the attendant concept of 'value of a property' before they enter kindergarten. (They unconsciously learn to unconsciously use the concepts.) Their knowledge of the concepts is reflected in their understanding and use of adjectives and adverbs, which (as discussed in Section 4.2) usually serve in language to express the values of properties. In contrast, children do not generally learn the concept of 'variable' until they first study algebra which (as discussed in Section 5.10) is usually in grade 6 or later. This supports the view that the two concepts are usefully viewed as separate concepts, with (given the developmental sequence) properties being more fundamental than variables.
- Variables can be used in algebraic expressions, where they are represented
by symbols, such as
*x*and*y.*In contrast, properties are not generally viewed as being usable in such expressions. - The concept of 'variable' implicitly contains the concept of the
*value*of the variable. (When a variable is used in an algebraic expression it usually stands for its value, as opposed to standing for the variable in some other sense.) In contrast, the concept of 'property' seems less inclined to (directly) contain the concept of its value. - The concept of 'variable' in statistics comes from mathematics. The concept
of 'variable' in mathematics originally came from the concept of 'property'.
However, the concept of 'property' is generally not explicit in mathematical
discussions of variables. This is because (as discussed in Appendix A.4) the
abstract approach of ignoring the entities and properties enables the
mathematical material to be substantially more general. Thus the concept of
'variable' has a strong
*abstract*sense while the concept of 'property' is more concrete.
In light of these points, I recommend viewing the concept of 'property of an entity' as being fundamental and viewing the concept of 'variable' as being defined in terms of it -- a (statistical) variable is a formal representation of a property of entities. I further discuss the distinction between properties and variables in a paper (1998a, app. E).
Section 4.5 introduces the phrase "relationship between variables" and notes that several terms are available that we can use instead of the term "relationship" in this phrase. For example, any of the following terms can replace the term "relationship" in the phrase: association, attachment, attunement, bond, consanguinity, commensurability, compatibility, complementarity, concord, concordance, concurrence, conformance, conformation, conformism, conformity, congruity, connection, connectiveness, consilience, correlation, correspondence, coupling, covariation, dependence, entanglement, equation, equivalence, function, harmony, homogeny, homology, interchange, intercommunication, interconnection, intercourse, interdependence, interlacing, interlinking, intermeshing, interpenetration, interplay, interrelation, interrelationship, intertwining, interweaving, interworking, kinship, liaison, link, linkage, linking, marriage, mimicking, mimicry, mutual dependence, mutualism, mutuality, nexus, parallel, parallelism, parity, partnership, propinquity, proportionality, rapprochement, reciprocality, reciprocalness, reciprocation, reciprocity , relatedness, relation, relativeness, relativism, relativity, simulacrum, symbiosis, symmetry, sympathy, tie, unity, yoke. Which of these terms is preferred in general discussion? As before, a reasonable approach to determining the preferred term is to begin by considering the frequency of use of the terms in the literature typically encountered by high school and university students. Appendix B above introduces the Breland and Jenkins word frequency data (1997). Table F.1, which is extracted from those data, shows the frequency of use of some more commonly used synonyms for the term "relationship". Table F.1
SOURCE: Some of the synonyms in Table F.1 have multiple or special senses (which the Breland and Jenkins data do not distinguish). For example, the noun "marriage" is almost always used in a special (matrimonial) sense different from the sense of 'relationship' used in this paper. Similarly, the noun "association" can denote a "relationship" in the sense used in this paper, but it can also denote an organization of people who have a common interest. Similarly, the nouns "relation" and "function" have multiple senses. On the other hand, study of the dictionary definition suggests that the term "relationship" has only one general sense -- denoting some link, connection, or involvement that exists between two or more entities. (In the present discussion the link is between two entities that are properties or variables.) The frequency-of-use statistics together with the multiple and special meanings of some of the terms suggest that "relationship" is the preferred term. Some readers may object to using the term "relationship" because they feel this term should be reserved for relationships that are "perfect" -- i.e., relationships between variables in which the error term in the model equation is zero. But such perfect relationships are never found in practice. Perfect relationships are not found due to measurement error, which is invariably present because (in general) no measuring instrument is perfect. Since we cannot completely eliminate measurement error, it is technically impossible to definitively conclude that the only error present in a model equation is true measurement error. Thus it is technically impossible to infer that a relationship between variables is "perfect". Thus instances of "perfect" relationships between variables cannot be shown to exist in practice. Thus it is a waste of the term "relationship" to reserve it for such non-existent-in-practice instances. (Some readers may believe that some perfect relationships between variables
are studied in the physical sciences. Certainly some physical relationships
Some readers may recall the important phrase "association does not prove causation" and lean toward the term "association" because that term appears in this phrase. However, the phrase seems to work at least as well if it is stated as "relationship does not prove causation". I contrast the term "relationship" with the term "relation" in a Usenet post (1999, app. A). The preceding points lead me to recommend using the noun "relationship" in the phrase "~ between variables". Appendix I.2 discusses whether it should be "relationship between
variables" or "relationship
This appendix evaluates some of the commonly used terms to name the response variable and the predictor variable(s) in the study of a relationship between variables. Let us first consider terms that are used to name the response variable. This paper uses the term "response" to name the response variable. This term is effective because it conveys the important idea that something is being responded to. Thus the student wonders "response to what?", which links directly to the important concepts of 'predictor variable' and 'relationship between variables'. The response variable in a relationship between variables is also sometimes
called the "predicted" variable. The term "predicted" has
the advantage over the term "response" that it is less likely to
connote direct causation -- a danger with the term "response". It is
important that students understand that many relationships between variables,
although clearly mediated by causal connections, are not The term "predicted variable" also has the advantage that it implies the concept of 'prediction' which, as discussed in Section 4.4, is an important concept in empirical research. However, the concept of 'prediction' is also implied by the term "predictor". Thus if we use the term "predictor" to name the predictor variable(s), we can imply the concept of 'prediction' even if we choose not to use the term "predicted" to name the response variable. The term "predicted variable" has the disadvantage that it is less
effective than the term "response variable" at suggesting that the
variable is The term "predicted variable" also has the disadvantage that it
does not work well when the relationship The above discussion suggests the possibility of using two terms -- "response variable" for experiments and "predicted variable" for observational research projects. However, most readers will agree that many symmetries exist between the study of causal and non-causal relationships between variables. (The key difference is that at least one of the predictor variables is "manipulated" by the experimenter in an experiment while all of the predictor variables are merely "observed" in an observational research project.) The existence of the many symmetries between experiments and observational research projects suggests that we can maximize student understanding by using only a single term to name the concept of 'response variable', which seems to be quite reasonably viewed as only a single concept. The above points lead me to prefer a single term to name the response variable and to prefer the term "response variable" over the term "predicted variable". Some discussions of empirical research refer to the response variable as the
"dependent" variable, which reflects the terminology of mathematical
functions. The dependent variable of a mathematical function is (except in the
trivial case of a constant function) On the other hand, in statistical discussions of empirical research the term
"dependent" implicitly assumes what we are often directly interested
in determining, which is The point of the preceding paragraph also applies to the term "response", but to a much lesser extent. That is, it seems confusing and counterintuitive to think that a "dependent" variable may or may not actually "depend" on a predictor variable. But it is reasonable and consistent with the everyday sense of the concept of "respond" to think that a "response" variable may or may not actually "respond" to a predictor variable -- no response is an allowable "response". The response variable in a relationship between variables is also sometimes call the "output" variable, especially in the study of physical processes. The term "output" suggests the notion of a relationship because a reference to an "output" variable suggests that an "input" variable ought also to be present. However, the term "response" seems to more directly convey the important notion that the response variable is "responding" (or is hoped to respond) to the predictor variable(s), thereby more strongly implying the sense of 'relationship'. Also, the term "response" seems to work effectively in many (all?) situations in which the term "output" is used. The response variable in a relationship between variables is also sometimes called the "criterion" variable in recognition of the fact that it is important -- it is the variable we wish to learn how to better predict or control. However, the term "criterion", apart from implying the importance of the variable, is somewhat empty. In particular, it lacks the important idea of responding to the predictor variable(s) that is inherent in the term "response". The above points lead me to recommend that teachers use the term "response" to name the response variable in a relationship between variables. Let us now consider some of the terms used to name the predictor variable(s) in a relationship. This paper uses the term "predictor" to name the predictor variable(s). This term has the advantage that it directly suggests the important concept of 'prediction' which, as discussed in Section 4.4, is a key goal of empirical research. Furthermore, the "or" suffix on the term "predictor" enables the term to differentiate itself from what is being predicted, even for beginners. Thus the term leads beginners to wonder about the identity of the "predictee" variable, thereby leading to the ideas of 'response variable' and 'relationship'. A predictor variable is also sometimes called an "explanatory" variable. However, the term "explanatory" emphasizes the concept of 'explanation' which (as noted in Section 4.4) is generally subordinate to the concept of 'prediction'. A predictor variable is also sometimes called an "input" variable. As with the term "output", the term "input" does suggest the concept of 'relationship between variables'. However, the term "input" does not suggest the important concept of 'prediction' as well as the term "predictor". A predictor variable is also sometimes called an "independent"
variable. As with the term "dependent", the term
"independent" reflects the terminology of mathematical functions. The
term "independent" has the disadvantage that it suggests an idea that
that we almost always hope is false. That is, we almost always hope that a
predictor variable is The above points lead me to recommend that teachers use the term "predictor" to name the predictor variable(s) in a relationship between variables. The paper for students lists other terms that are sometimes used to name the response and predictor variables in the study of a relationship between variables (1997a, sec. 7.6). Similar arguments to the above can be given why these terms are also less effective than the terms "response" and "predictor".
Section 4.5 introduces the concept of 'relationship between variables' and
states that most empirical research projects can be easily and usefully viewed
as studying relationships between variables (as a means to predict and control
the values of variables). Some readers may be unaware that the concept of
'relationship between variables' can be used as widely as this paper claims.
(Probably more than one-half of all empirical research projects are The remainder of this appendix discusses six aspects of empirical research
projects that may at first appear
Suppose a physics or chemistry experiment has discovered that if ingredients A and B are mixed together under conditions C, then ingredient D appears. Does this experiment study a relationship between variables? Yes. The predictor variables are "amount of ingredient A", "amount of ingredient B", and the variables that reflect the conditions C. The response variable is "amount of ingredient D that is produced". Viewing the experiment in terms of a relationship between variables has the benefit of reminding us that in addition to being interested in the fact that ingredient D is produced, we also wish to know how much of ingredient D is produced for given values of A, B, and C. This knowledge gives us better prediction or control capability of D. (The population of entities in the example is all the instances [trials, runs, cases] ever in which the ingredients A and B are brought together under conditions C, and the sample is the set of those instances that occur in the experiment.) Some readers may wonder whether a research project that performs
"parameter estimation" can be viewed as studying a relationship
between variables. One type of parameter-estimation research project is
intended to determine (i.e., estimate) the values of one or more
"population parameters", where a population parameter is simply some
property of a population, such as the average of the values of some property of
the entities in the population. (In more general terms, the procedure is
studying the univariate distribution of the parameter.) Any standard procedure
for estimating the value of a population parameter can be usefully viewed as
the study of a degenerate case of a relationship between variables in which the
response variable is present, but Note that if we adopt the foregoing point of view of parameter estimation,
we are naturally led to ask in particular cases A second type of parameter-estimation research project can be performed to determine (i.e., estimate) the values of one or more parameters of a model (equation) of a relationship between variables. Because model equations are directly related to the study of relationships between variables, this second type of parameter estimation is simply a particular aspect of the study of relationships between variables. Some readers may wonder whether a research project that performs "interval estimation" can be viewed as studying a relationship between variables. Here, given the links established in Appendix H.2 between parameter estimation and relationships between variables, it is easy to see similar links between interval estimation and relationships between variables. The links occur because the intervals in interval estimation are simply intervals in which we determine (with a stated level of "confidence") that the associated parameters probably lie. Section 7.10 states that most practical instances of statistical tests of hypotheses can be usefully viewed as testing for evidence of the existence of a relationship between variables (or testing for evidence of the existence of an extension to an already known relationship between variables). The following discussion expands this point. Consider a research project that uses an independent-samples Yes. The response variable is the measured value of the "response"
to the treatment for each patient who participated in the research project. The
predictor variable is the variable "gender", which reflects an
important property of the patients. Thus we can view this research project (and
the associated Similarly, in a research project in which the entities are cross-classified in two or more different ways, each classification (i.e., each subscript or margin) represents a different predictor variable. These predictor variables may reflect properties of the entities that are under study, or they may reflect properties of the entities' environment. In the case of a particular treatment that is The preceding points together with consideration of the standard statistical
tests suggest that we can view many instances of the standard tests of
hypotheses (e.g., the But even if we agree that we can To answer that question, note that it is precisely instances of the concept
of a relationship between variables in entities that empirical researchers are
usually interested in detecting when they use statistical tests in research
projects. That is, (using the definition of a relationship between variables in
the paper for students [1997a, sec. 7.10]) researchers are usually precisely
interested in determining whether the expected value in entities of some
variable (Another valid way of viewing some statistical tests is to say that they are
techniques for detecting differences between subpopulations, or differences
between groups, or significant differences between sample means [e.g., Abelson
1995, p. 27, Jones and Tukey 2000]. Thus in the The Human Genome Project has recently identified or "mapped" almost all the relevant portions of the sequence of three billion "base pairs" in the human genome. This sequence is generally viewed as the genetic blueprint for the human species. From an easily-adopted high-level point of view, the Human Genome Project
does Some of the types of entities relevant to the Human Genome Project are (in rough decreasing order of conceptual or physical containment) species, humans, chromosomes, DNA, genomes (i.e., complete DNA sequences), genes, proteins, and base pairs. Base pairs can be four different types (which are labeled A, T, C, and G), and the linear sequence of these four types in the human genome is believed to encode (in one important sense) all the information needed to produce a human being. A reasonable way to view the human genome is to view the sequence of base
pairs in the genome as a A next step after determining the sequence of base pairs is to identify more genes, which are linear subsequences of base pairs in the human genome that are associated with inherited traits. Other steps are to identify the proteins that are activated by the genes and to identify drugs or other approaches that can retard or increase the production of these proteins. Control of production of these proteins will help doctors to achieve a main goal of this research, which is to control (i.e., improve) the values of variables that reflect measures of human health. As noted, it is reasonable to view the Human Genome Project as studying a
Appendix I.2 discusses some other empirical research methods that do not
study relationships between variables, but instead study relationships
Section 5.4 lists twenty-one statistical methods and then makes two claims about these methods. This appendix provides support for the claims and discusses some related issues.
One claim in Section 5.4 is that the list of twenty-one methods contains almost all of the currently popular statistical methods. This is supported by the fact that if one surveys empirical research projects that use statistical methods, one finds that a large majority of these research projects use as their main statistical method(s) one or more of the methods in the list.
I call the statistical methods in the list in Section 5.4 "response-variable" methods. My criterion for calling a method a response-variable method is A statistical method is a All the methods in the list can be easily viewed as satisfying this criterion. (Section 7.11 and Appendix H.2 discuss the study of univariate distributions
-- i.e., the group of response-variable methods in which a researcher uses a
single response variable and The criterion above states that response-variable methods focus on a
In the following discussion of multivariate methods I exclude the important
and frequently used method of "repeated measurements" or
"repeated measures". Although research projects that use repeated
measurements may be reasonably Multivariate methods (excluding repeated measurements) are used only rarely
in real empirical research. However, if these methods I further discuss research projects with multiple response variables in a Usenet post (2002). In considering the twenty-one response-variable methods the question arises
whether we should speak of relationships "between" the variables or
relationships "among" the variables. I recommend using the
preposition "between" when referring to relationships studied by the
response-variable methods. This is because in any individual use of one of
these methods only a single response variable (which may on rare occasion be a
vector) is under study, and the relationship under study is I call the statistical methods that do not focus on a single response variable "no-response-variable" methods. Decision theory (discussed in Section 7.4) is a no-response-variable method because it focuses on making decisions rather than focusing on relationships between variables. Cluster analysis, factor analysis, principal components analysis (including
dual scaling and correspondence analysis), and a few other less frequently used
methods are also no-response-variable methods because they do not focus (either
explicitly or implicitly) on a specific single response variable. These methods
still study variables (properties of entities) and in a loose sense they also
study relationships I estimate that the no-response-variable methods are used in total in less than three percent of reported empirical research projects that use statistical methods. Thus although the no-response-variable methods are important in a few research projects, I believe they are not important topics for discussion in an introductory statistics course. On the other hand, the correlation coefficient and the study of contingency
tables are (implicitly) In exploratory data analysis one is "looking around" in data, often without a particular response variable in mind. Thus the question arises whether we should view exploratory data analysis as a response-variable method or as a no-response-variable method. One answer is that if an exploratory data analysis is to be put to any practical use, a response variable and zero or more predictor variables will usually be (implicitly or explicitly) determined. Thus when exploratory data analysis is put to a practical use, it can usually be viewed as a response-variable method.
Section 5.4 lists the following four groups of techniques that statistical methods can perform: - techniques for
*detecting*relationships between variables - techniques for
*illustrating*relationships between variables - techniques for
*predicting*and*controlling*the values of variables on the basis of relationships between variables, and - miscellaneous techniques for the study of variables and relationships between or among variables.
Section 5.4 then lists twenty-one response-variable methods and claims that
the I cannot directly prove my claim that the twenty-one methods do no more than
what is described by the four groups of techniques. I cannot prove the claim
because it is a statement that something that is logically possible (i.e.,
another important statistical technique) does not exist. In general, such
statements cannot be empirically or theoretically supported. However, if my
claim is incorrect, one can easily
The four groups of techniques listed in the preceding subsection are
Since the four groups of techniques encompass both the response-variable and no-response-variable statistical methods, they encompass a large proportion of the field of statistics.
Nowadays a researcher wishing to use statistical methods to study relationships between variables must be skilled in the use of those methods (even though statistical software can do most of the arithmetic). And when unskilled researchers try to use the methods they often, through misunderstanding, make serious blunders. To prevent these blunders, and to help improve the quality of empirical research, it seems likely that statisticians will develop expert software that will interactively guide an unskilled researcher through the steps of properly designing an empirical research project, performing it, and interpreting the results. I call this software "research guidance software". To help researchers design a research project, research guidance software will help them to select the response and predictor variables and help them to determine the important details of the design. Some important goals are to design research projects - that are unequivocal
- that have minimum cost
- whose statistical tests have maximum power (Appendix B in the paper for students [1997a])
- whose subsequent predictions or control will have maximum accuracy (Appendix E in the paper for students [1997a]).
To help researchers perform a research project, the software will be capable of controlling any instruments used in the research project and capable of obtaining values of variables directly from the instruments. To help researchers interpret the results of a research project, the software will automatically guide them through the phases of checking data for anomalies, detecting relationships between variables, displaying relationships, and prediction or control on the basis of relationships. The output of a research guidance software system will likely appear in a web browser, since web browsing software has advanced tools for efficiently displaying and linking many forms of information. The output will include custom text and graphics that reflect specific details of the research project under study. (The text will be generated from pre-written templates.) To bring the user interface up to the visual resolution of a printed textbook, the software and video system will be capable of simultaneously displaying the equivalent of at least two full pages of highly legible text and graphics from a standard textbook. (Computer monitors are beginning to approach this capability.) In addition, the system will be capable of displaying standard television-quality video segments in which a narrator discusses and illustrates a concept. (As a prolific writer of notes in the margins of the books I buy and the computer output I generate, I recommend that the software provide a convenient way for users to "write notes in the margin" of its on-screen output [including output of the help system]. I also recommend that the software be able to automatically preserve these notes and details of their linkages when it is upgraded.) In the past, most researchers and teaching facilities lacked access to computer hardware that could run the type of system described in the preceding paragraphs. Thus the market for research guidance software was too small to justify the development cost of a comprehensive system. However, nowadays the necessary computer hardware is within most researchers' and many teaching facilities' budgets. Furthermore, it is now easier to develop a research guidance software system because such a system can use an existing statistical software product as its data analysis engine. (At least one leading statistical software vendor has developed an "output delivery system" that can provide all output from its statistical procedures in formats that are easily used as input by other programs [SAS Institute 2000].) This enables research guidance software developers to concentrate on other important tasks, such as creating the research-project-design and high-level-analysis modules, and writing the many necessary text templates for the system. The default path through any output from a research guidance software system will be on the highest conceptual level, which will focus the user on the important points. However, customized full information about the underlying details (including tutorials in the associated statistical concepts) will be only a keystroke or two away. Research guidance software will enable researchers to get much closer to their data because in interpreting the results of a research project the software will automatically examine many different graphical views of the data and will present the most interesting views (as defined by research in human perception) to the researcher. (Most researchers would be incapable of generating and examining all the necessary views manually.) Research guidance software will be designed by teams of statisticians, programmers, and editors. (In the past, the statisticians and editors would have developed statistics textbooks.) The quality of the writing, graphics, and layout will be equal to that of a superior textbook. For a research guidance software system to be complete, its developers must codify a complete general research design and data analysis strategy, a difficult but not impossible task. (The codification will involve extensive consultation with experts in research design and data analysis and extensive testing with inexperienced users.) The approach must be general enough and well-enough presented to enable a well-motivated neophyte to properly design, perform, and analyze efficient simple empirical research projects. Rudimentary research guidance software systems have begun to appear. Silvers, Herrmann, Godfrey, Roberts, and Cerys (1994) discuss one such system and give references to important earlier systems. Gale and Pregibon (1984) were the pioneers. The inevitability of useful research guidance software arises from the fact that today's best research project designers and data analysts work by simply proceeding through a complicated (often subconscious, sometimes vague) decision tree. Thus three steps are necessary: - Decision trees must be elicited from master designers and data analysts.
- Decision trees must have any vagueness removed (or perhaps parameterized under user control, with consensually chosen default values for the parameters).
- Software must be developed that will (in the manner of a wise and friendly consultant, and using different levels of detail depending on the user's knowledge) guide a researcher through the decision tree.
Once these steps are completed, the ability of research guidance software to help researchers perform research projects will approach (and likely someday surpass) that of an expert. Research guidance software amounts to copying both statistical thought and research thought from experts' minds and from textbooks into the computer. Once (properly) captured in the computer, the thought can be automatically customized by the computer for the situation at hand, and the researcher or student can actively interact with the thought. Carefully designed customized interaction guarantees better understanding. The teaching component will be an important part of any research guidance software system. In the better systems the teaching component will be easy to understand, comprehensive, and compliant with norms of statistical practice. Meeker presents a similar vision for statistical technology in an article by Moore, Cobb, Garfield, and Meeker (1995, sec 2.2). Abelson, R. P. (1995), Albert, J. (1993), "Teaching Bayesian Statistics Using Sampling Methods
and Minitab," Albert, J. (1996), Albert, J. (1997), "Teaching Bayes' Rule: A Data-Oriented
Approach" (with discussion), American Statistical Association (2000), "Curriculum guidelines for undergraduate programs in statistical science." Available at http://www.amstat.org/education/Curriculum_Guidelines.html Anderson, J. E. and Sungur, E. A. (1999), "Community Service Statistics
Projects," Andrews, D. F., and Herzberg, A. M. (1985), Antelman, G. (1997), Ballman, K. (2000), "Real Data in Classroom Examples," in
Barnett, V. 1982. Berry, D. A. (1996), Berry, D. A. (1997), "Teaching Elementary Bayesian Statistics with Real
Applications in Science" (with discussion), Berry, D. A., and Lindgren, B. W. (1996), BIPM (Bureau International des Poids et Mesures, International Bureau of
Weights and Measures) (2001) "Welcome," Bisgaard, S. (1991), "Teaching Statistics to Engineers," Boomsma, A., and Molenaar, I. W. (1991), "Resampling with More
Care" (with discussion), Blackwell, D. (1969), Box, G. E. P. (1995), "Scientific Statistics - The Way Ahead"
(abstract), in Box, G. E. P. and Draper, N. R. (1987), Bordley, R. F. (2001), "Teaching Decision Theory in Applied Statistics
Courses," Bradstreet, T. E. (1996), "Teaching Introductory Statistics Courses So
That Nonstatisticians Experience Statistical Reasoning," Breland, H. M. and Jenkins, L. M. (1997), Britz, G., Emerling, D., Hare, L., Hoerl, R., and Shade, J. (1997),
"How to Teach Others to Apply Statistical Thinking," Bryce, G. R. (1992), "Data Driven Experiences in an Introductory
Statistics Course for Engineers Using Student Collected Data," in
Carlson, R. R. (1989), "A Paper Clip Experiment," in Carroll, J. B., Davies, P., and Richman, B. (1971), Chance, B. (1997), "Experiences with Authentic Assessment Techniques in
an Introductory Statistics Course," Chatterjee, S., Handcock, M. S., and Simonoff, J. S. (1995), Chromiak, W., Hoefler, J., Rossman, A., and Tesman, B. (1992), "A
Multidisciplinary Conversation on the First Course in Statistics," in
Cobb, G. W. (1987), "Introductory Textbooks: A Framework for
Evaluation," Cobb, G. W. (1992), "Teaching Statistics," in Cobb, G. W. (1993), "Reconsidering Statistics Education: Cobb, G. W. (1997), Cobb, G. W. (2000), "Teaching Statistics: More Data, Less
Lecturing," in Cochran, W. G. (1968), "Errors of Measurement in Statistics,"
Cox, D. R. (1998), "Statistics for the Millenium: Some Remarks on
Statistical Education," Cox, D. R. (1999), "Variable, types of," in Cryer, J. D., and Cobb, G. W. (1997), Czitrom, V., and Spagon, P. D. (eds.) (1997), DeGroot, M. H. (1986), de Lorgeril, M and Salen, P. (2000), "Diet as Preventive Medicine in
Cardiology," Dietz, E. J. (1993), "A Cooperative Learning Activity on Methods of
Selecting a Sample," Dingle, H. (1952), Dixon, W. J. ed. (1964), Dixon, W. J. and Massey, F. J., Jr. (1983), Doane, D. P., Mathieson, K. D., and Tracy, R. L. (1997), Drake, S. (1970), Dunbar, R. (1995), Edwards, W. and Fasolo, B. (2001), "Decision Technology,"
Eltinge, E. M. (1992), "Diagnostic Testing for Introductory Statistics
Courses," in Everitt, B. S. (1998), Falk, R., and Konold, C. (1992), "The Psychology of Learning
Probability," in Fillebrown, S. (1994), "Using Projects in an Elementary Statistics
Course for Non-Science Majors," Fowler, W. S. (1962), Freund, J. E. and Walpole, R. E. (1987), Gal, I., and Garfield, J. B. (1997a), "Curricular Goals and Assessment
Challenges in Statistics Education," in Gal, I., and Garfield, J. B. (eds.) (1997b), Gale, W. A., and Pregibon, D. (1984), "REX: Garfield, J. (1993), "Teaching Statistics Using Small-Group Cooperative
Learning," Garfield, J. (1994), "Beyond Testing and Grading: Garfield, J. (1995), "How Students Learn Statistics,"
Garfield, J. (2000), "Beyond Testing and Grading: New Ways to Use
Assessment to Improve Student Learning," in Garfield, J. B., and Gal, I. (1999), "Assessment and Statistics
Education: Current Challenges and Directions," Goldman, R. N., McKenzie, J. D., Jr., and Sevin, A. D. (1997), "The
BCASA Conference on Assessment in Statistics Courses," in Gordon, F., and Gordon, S. (eds.) (1992), Gunter, B. (1993), "Through a Funnel Slowly with Ball Bearing and
Insight to Teach Experimental Design," Gunter, B. (1996), "The MISD/MMSTC Statistical DOE Project."
Available at Halvorsen, K. T., and Moore, T. L. (1991), "Motivating, Monitoring, and
Evaluating Student Projects," in Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski, E.
(eds.) (1994), Hansen, J. L. (1980), "Using Physical Demonstrations When Teaching Data
Analysis," in Hawkins, A., Jolliffe, F., and Glickman, L. (1992), Hayden, R. W. (2000), "Advice to Mathematics Teachers on Evaluating
Introductory Statistics Textbooks," in Hesterberg, T. C. (1998), "Simulation and Bootstrapping for Teaching
Statistics," Higgins, J. J. (1999), "Nonmathematical Statistics: A New Direction for
the Undergraduate Discipline (with discussion)," Hoaglin, D. C., and Moore, D. S. (eds.) (1992), Hoerl, R., and Snee, R. (1995), "Redesigning the Introductory Statistics Course (Report No. 130)," Madison, WI: University of Wisconsin Center for Quality and Productivity Improvement. Hoerl, R., and Snee, R. (2002), Hogg, R. V. (1990), "Statisticians Gather to Discuss Statistical
Education," Hogg, R. V. (1991), "Statistical Education: Hogg, R. V. (1992), "Towards Lean and Lively Courses in
Statistics," in Hogg, R. V. (1999), "Let's Use CQI in Our Statistics Programs (with
discussion)," Hogg, R. V., and Hogg, M. C. (1995), "Continuous Quality Improvement in
Higher Education," Holcomb, J. P., Jr. and Ruffer, R. L. (2000), "Using a Term-Long
Project Sequence in Introductory Statistics," Hume, D. (1748), Sceptical Doubts Concerning the Human Understanding. In
Hunter, W. G. (1977), "Some Ideas About Teaching Design of Experiments,
with 2 Iman, R. L. (1994), "The Importance of Undergraduate Statistics,"
Iman, R. L., and Conover, W. J. (1983), Iversen, G. R. (1992), "Mathematics and Statistics: Jones, L. V. and Tukey, J. W. (2000), "A sensible formulation of the
significance test," Jowett, G. H., and Davies, H. M. (1960), "Practical Experimentation as
a Teaching Method in Statistics" (with discussion), Kotz, S. and N. L. Johnson, eds. (1982-1988), Kromhout, D. (1999), "Fatty Acids, Antioxidants, and Coronary Heart
Disease from an Epidemiological Perspective," Kruskal, W. H. and J. M. Tanur, eds. (1978), Lehmann, E. L. (1993), "The Fisher, Neyman-Pearson Theories of Testing
Hypotheses: One Theory or Two?" Liebson, P. R. and Amsterdam, E. A. (1999), "Prevention of Coronary
Heart Disease. Part I. Primary Prevention," Lipsey, M. W. (1990), "Theory as Method: Small Theories of
Treatments," in Mackisack, M. (1994), "What Is the Use of Experiments Conducted By
Statistics Students?" Macnaughton, D. B. (1986), Macnaughton, D. B. (1996a), "Re: EPR Approach to Intro Stat:
Relationships Between Variables (response to comments by George Zeliger)."
Published in EdStat and sci.stat.edu on July 30, 1996. Available at Macnaughton, D. B. (1996b), "Re: EPR Approach to Intro Stat: Macnaughton, D. B. (1996-2001), "[responses to comments about the EPR
approach]" published on various dates in EdStat and sci.stat.edu.
Available at Macnaughton, D. B. (1997a), "The Entity-Property-Relationship Approach
to Statistics: Macnaughton, D. B. (1997b), "EPR Approach and Scientific 'Explanation'
(response to comments by Robert Frick)." Published in EdStat and
sci.stat.edu on July 23, 1997. Macnaughton, D. B. (1997c), "EPR Approach to Intro Stat: Entities,
Properties, and Variables." Published in EdStat and sci.stat.edu on
February 26, 1997. Macnaughton, D. B. (1997d), "Re: How Should We Motivate Students in
Intro Stat? (response to comments by John R. Vokey" Published in EdStat
and sci.stat.edu on April 6, 1997. Available at Macnaughton, D. B. (1998a), "Eight Features of an Ideal Introductory
Statistics Course." Available at Macnaughton, D. B. (1998b), "Review of ActivStats 2.0," Macnaughton, D. B. (1998c), [responses to comments about the paper
"Eight Features of an Ideal Introductory Statistics Course"]. Macnaughton, D. B. (1998d), "Which Sums of Squares Are Best in
Unbalanced Analysis of Variance?" Available at Macnaughton, D. B. (1998e), "Re: Eight Features of an Ideal Intro Stat
Course (response to comments by Gary Smith)," Published in EdStat and
sci.stat.edu on November 23, 1998. Available at Macnaughton, D. B. (1999), "Re: Eight Features of an Ideal Intro Stat
Course (response to comments by Herman Rubin)," Published in EdStat and
sci.stat.edu on May 16, 1999. Available at Macnaughton, D. B. (2000), "Re: Eight Features of an Ideal Intro Stat
Course (response to comments by Bob Hayden)," Published in EdStat and
sci.stat.edu on July 23, 2000. Available at Macnaughton, D. B. (2001), "Re: Eight Features of an Ideal Intro Stat
Course (response to comments by Ronan M. Conroy)," Published in EdStat and
sci.stat.edu on February 6, 2001. Available at Macnaughton, D. B. (2002), "Definition of 'Relationship between
variables,'" Published in sci.stat.* and EdStat on January 28, 2002.
Available at Magel, R. C. (1996), "Increasing Student Participation in Large
Introductory Statistics Classes," Marriott, F. H. C. (1990), McKenzie, J. D., Jr. (1992), "The Use of Projects in Applied Statistics
Courses," in Michaelsen, L. K. (1999), "Myths and Methods in Successful Small Group
Work," Moore, D. S. (1992a), "Introduction: Moore, D. S. (1992b), "Teaching Statistics as a Respectable
Subject," in Moore, D. S. (1993), "The Place of Video in New Styles of Teaching and
Learning Statistics," Moore, D. S. (1997a), "New Pedagogy and New Content: Moore, D. S. (1997b), Moore, D. S. (1997c), "Bayes for Beginners? Some Pedagogical
Questions," in Moore, D. S. (1997d), "Bayes for Beginners? Some Reasons to
Hesitate" (with discussion), Moore, D. S. (2000), Moore, D. S. (2001), "Undergraduate Programs and the Future of Academic
Statistics," Moore, D. S., Cobb, G. W., Garfield, J., and Meeker, W. Q. (1995),
"Statistics Education Fin de Siècle," Moore, T. L. (ed.) (2000), Moore, T. L., and Roberts, R. A. (1989), "Statistics at Liberal Arts
Colleges," Mosteller, F. (1988), "Broadening the Scope of Statistics and
Statistical Education," Mosteller, F. (1990), "Improving Research Methodology: An
Overview," in Newton, H. J., and Harvill, J. L. (1997), "StatConcepts: A Visual Tour
of Statistical Ideas," in Nie, N., D. H. Bent, and C. H. Hull. 1970. Noether, G. E. (1992), "An Introductory Statistics Course: The
Nonparametric Way," in Nolan, D. and Speed, T. P. (1999), "Teaching Statistics Theory Through
Applications," Ottaviani, M. G. (ed.) (1996), OzData (1999), "OzData: Australasian Data and Story Library." Parr, W. C., and Smith, M. A. (1998), "Developing Case-Based Business
Statistics Courses," Pearl, D. K., Notz, W. I., and Stasny, E. A. (1996), "Finding Examples
- The EESEE Way Out," in Pearson, K. (1902) "On the mathematical theory of errors of judgment,
with special reference to the personal equation," Peck, R., Haugh, L. D., and Goodman, A. (1998), Pollack, S., Fireworker, R., and Borenstein, M. (1995), "Some
Resampling Algorithms for the Testing of Hypotheses," in Radke-Sharpe, N. (1991), "Writing As a Component of Statistics
Education," Roberts, H. V. (1992), "Student-Conducted Projects in Introductory
Statistics Courses," in Rossman, A. J. (1996), Ruberg, S. J. (1990), "The Statistical Method: Rubin, H. (1996), "Re: Defining 'Variable'". Published on June 30,
1996 in Usenet newsgroup sci.stat.edu. Rumsey, D. J. (ed.), (2001), "STAR Library". This archive of
resources for introductory statistics teachers is available at Samsa, G. and Oddone, E. Z. (1994), "Integrating Scientific Writing
Into a Statistics Curriculum: A Course in Statistically Based Scientific
Writing," SAS Institute Inc. (2000), [SAS Output Delivery System]. For details search
the site at Schau, C., and Mattern, N. (1997a), "Use of Map Techniques in Teaching
Applied Statistics Courses," Schau, C., and Mattern, N. (1997b), "Assessing Students' Connected
Understanding of Statistical Relationships," in Scheaffer, R. L. (1992), "Data, Discernment and Decisions: An Empirical
Approach to Introductory Statistics," in Scheaffer, R. L. (2001), "Statistics Education: Perusing the Past,
Embracing the Present, and Charting the Future," Scheaffer, R. L., Gnanadesikan, M., Watkins, A., and Witmer, J. A. (1996),
Scott, J. F. (1976), "Practical Projects in the Teaching of Statistics
at Universities," Sevin, A. D. (1995), "Some Tips for Helping Students in Introductory
Statistics Classes Carry Out Successful Data Analysis Projects," in
Silvers, A., Herrmann, N., Godfrey, K., Roberts, B., and Cerys, D. (1994),
"A Prototype Statistical Advisory System for Biomedical Researchers,"
Simon, J. L. (1993), Simon, J. L. (1994), "What Some Puzzling Problems Teach About the
Theory of Simulation and the Use of Resampling," Simon, J. L., and Bruce, P. (1991), Simon, L., Harkness, W., Buchanan, P., Chow, M., Heckard, R., Lane, J., and Zimmaro, D. (2000), "Restructuring the Elementary Statistics Course: The Penn State Model," xx session presented at the Joint Statistical Meetings in Indianapolis on August 14, 2000. Skyrms, B. (1986), Smith, A. F. M., and Gelfand, A. E. (1992), "Bayesian Statistics
Without Tears: A Sampling-Resampling Perspective," Snee, R. D. (1993), "What's Missing in Statistical Education?"
Snell, J. L. (1999), "Chance Database." Available at Snell, J. L., and Finn, J. (1992), 'A Course Called "Chance",'
StatLib (1999). This archive of datasets and other statistical information
is available at Stromberg, A. J. and Ramanathan, S. (1996), "Easy Implementation of
Writing in Introductory Statistics Courses," Swets, J. A., Dawes, R. M., and Monahan, J. (2000), "Better Decisions
through Science," Sylwester, D. L., and Mee, R. W. (1992), "Student Projects: Teaching Effectiveness Program, University of Oregon (2000), "Effective
Assessment." Available at Tukey, J. W. (1977), Tukey, J. W. 1989. "SPES in the Years Ahead," in Velleman, P. F. (1998), Velleman, P. F. and J. M. Lefkowitz. 1985. Velleman, P. F., and Hoaglin, D. C. (1992), "Data Analysis," in
Velleman, P. F., Hutcheson, M. C., Meyer, M. M., and Walker, J. H. (1996),
"DASL, the Velleman, P. F., and Moore, D. S. (1996), "Multimedia for Teaching
Statistics: Promises and Pitfalls," Velleman, P. F., and Wilkinson, L. (1993), "Nominal, Ordinal, Interval,
and Ratio Typologies are Misleading," Vogt, W. P. 1993. Wallis, W. A. 1980. "The Statistical Research Group, 1942-1945 (with
discussion)," Wardrop, R. L. (2000), "Small Student Projects in an Introductory
Statistics Course," in Watkins, A., Burrill, G., Landwehr, J. M., and Scheaffer, R. L. (1992),
"Remedial Statistics?: The Implications for Colleges of the Changing
Secondary School Curriculum," in Wild, C. J. (1995), "Continuous Improvement of Teaching: Wilkinson, L. and Task Force on Statistical Inference (1999),
"Statistical Methods in Psychology Journals: Guidelines and
Explanations," Willemain, T. R. (1994), "Bootstrap on a Shoestring: Resampling using
Spreadsheets," Willett, J. B., and Singer, J. D. (1992), "Providing a Statistical
'Model': Teaching Applied Statistics using Real-World Data," in
Williams, L. P. (1989), "André-Marie Ampère,"
Wonnacott, T. H. (1992), "More Foolproof Teaching Using
Resampling," in Zahn, D. A. (1992), "Student Projects in a Large Lecture Introductory
Business Statistics Course," in Zahn, D. A. (1994), "A Brain-Friendly First Day of Class," in
CONTENTS 1. INTRODUCTION * 2. A DEFINITION OF "EMPIRICAL RESEARCH" * 3. COURSE GOALS * 3.1 The Value of Emphasizing Goals * 3.2 Topic-Based Goals Have a Significant Drawback * 3.3 Recommended Goals (A Lasting Appreciation of the Role of Statistics) * 4. SIX CONCEPTS * 4.1 Entities * 4.2 Properties of Entities * 4.3 Variables * 4.4 A Goal of Empirical Research: To Predict and Control the Values of Variables * 4.5 Relationships Between Variables as a Key to Prediction and Control * 4.6 Statistical Techniques for Studying Relationships Between Variables as a Means to Accurate Prediction and Control * 4.7 General Comments * 5. EVALUATING THE EPR APPROACH * 5.1 Main Differences Between the EPR Approach and Other Approaches * 5.2 The Concepts of the Approach Are Easy to Understand * 5.3 The Approach Provides a Deep and Broad Foundation for Statistical Concepts * 5.4 The Approach Unifies Statistical Methods * 5.5 The Approach Links Well with General Concepts of Science * 5.6 The Approach Unifies Empirical Research * 5.7 The Approach Links Well with General Concepts of Commerce * 5.8 The Approach Links Well with Language * 5.9 The Concepts of the Approach Are Fundamental * 5.10 Easy-to-Understand Fundamental Concepts Should Be Taught First * 5.11 The Approach Gives Students a Lasting Appreciation of Statistics * 5.12 The Approach Links Well with the Concept of 'Data Analysis' * 5.13 The Concepts Are Old But the Approach Is New * 5.14 The Approach Links Well with Other Approaches to the Introductory Course * 5.15 Responses to Criticisms of the EPR Approach * 6. TESTING THE EPR APPROACH * 6.1 Methods of Testing * 6.2 Testing of the EPR Approach * 7. IMPLEMENTING THE EPR APPROACH * 7.1 Motivating Students on the First Day of Class * 7.2 What Topics Should Follow the Six Concepts? * 7.3 A Syllabus * 7.4 "Basis for Action" Versus "Decision Procedure" * 7.5 Practical Examples * 7.6 Generalization and Instantiation * 7.7 Feedback Systems * 7.8 Realistic Data Versus Real Data * 7.9 The Discussion of Mathematics in the Introductory Statistics Course * 7.10 Hypothesis Testing * 7.11 Univariate Distributions * 7.12 Implementation With Software Support * 8. SUMMARY * APPENDIX A: THE PRIORITY OF THE CONCEPT OF 'ENTITY' * APPENDIX B: THE TERM "PROPERTY" * APPENDIX C: THE EVOLUTION OF ENTITIES AND PROPERTIES IN HUMAN THOUGHT * APPENDIX D: DEFINING THE TERM "VARIABLE" * APPENDIX E: THE DISTINCTION BETWEEN PROPERTIES AND VARIABLES * APPENDIX F: THE TERM "RELATIONSHIP" * APPENDIX G: THE TERMS "RESPONSE VARIABLE" AND "PREDICTOR VARIABLE" * APPENDIX H: DO RESEARCH PROJECTS STUDY RELATIONSHIPS BETWEEN VARIABLES? * APPENDIX I: DOES THE EPR APPROACH UNIFY STATISTICAL METHODS? * APPENDIX J: FUTURE SYSTEMS FOR STUDYING RELATIONSHIPS BETWEEN VARIABLES * REFERENCES * |