Response to Comments by Mike Palij

                              Home


Subject: Re: Experimental Research In Education: The Most Exciting 
         Talk at the 2005 Joint Statistical Meetings 

     To: EdStat E-Mail List
         ApStat E-Mail List
         Teaching Statistics E-Mail List
         sci.stat.edu Usenet Newsgroup

   From: Donald B. Macnaughton < donmac@matstat.com >

   Date: Sunday February 11, 2007
	
-----------------------------------------------------------------

                   Some Basic Ideas of Science 
 
This post replies to comments by Mike Palij.  But first, to sup-
port the discussion, it is helpful to propose definitions of some 
basic ideas of science, as follows: 
 
    EMPIRICAL RESEARCH is any activity in which measurements 
    (observations) are gathered from some area of experience 
    and then reasonable conclusions are drawn from the meas-
    urements.   
     
    Measurements in empirical research are usefully viewed as 
    reflecting the values of VARIABLES, which reflect the 
    measured values of properties of entities.  (Entities may 
    be people, other living things, physical objects, or any 
    other type of object or thing.) 
     
    An EMPIRICAL RESEARCH PROJECT (or logical sub-unit of an 
    empirical research project) is usually usefully viewed as 
    studying the relationship between a single response vari-
    able and one or more predictor variables in the entities 
    in the population of interest.  This study is performed 
    by analyzing the measured values of the variables ob-
    tained from a sample of entities from the population. 
     
    An EXPERIMENT is an empirical research project in which 
    at least one of the predictor variables is “manipulated” 
    (i.e., caused to take certain values in the entities in 
    the sample) by the researcher. 
     
    An OBSERVATIONAL RESEARCH PROJECT is an empirical re-
    search project in which none of the predictor variables 
    is manipulated by the researcher, and the values of the 
    predictor variables are simply observed. 
     
I discuss the ideas in more detail in a paper (2002). 
 
The description of an empirical research project is qualified 
with the word “usually”.  This implies that some empirical re-
search projects (or sub-units) don’t satisfy the description -- 
that is, they can’t be reasonably viewed as studying the rela-
tionship between a single response variable and one or more pre-
dictor variables.  My experience suggests that research projects 
that can’t be reasonably viewed as satisfying the description ap-
pear in less than three percent of published empirical research 
reports (in science, technology, and business).  I discuss some 
of these research projects in the paper (2002, Appendix I.2).  
For economy of words, this small group of research projects is 
mostly ignored in the discussion below.   
  
 
               Novelty of Experiments in Education 
 
Quoting my August 25, 2005 post, Mike Palij wrote (on August 26 
in the EdStat e-mail list) 
 
   < snip > 
> Experimental studies in education are not a new idea.  
 
I agree.  I suspect that the idea of performing proper experi-
ments in education can be traced back to the 1920’s or 1930’s 
when Fisher, Neyman, Pearson, and other statisticians first ex-
plained and debated the idea of a proper experiment.  However, 
proper experiments in education haven’t often been performed. 
 
What is a “proper” experiment?  Here is a reasonable definition: 
 
    An experiment (or randomized trial) is a PROPER EXPERI-
    MENT if it has been performed according to widely ac-
    cepted principles of scientific practice, experimental 
    design, and data analysis, as described by Bailar and 
    Mosteller (1992), Box, Hunter, and Hunter (2005), Fleiss 
    (1986), Kirk (1995), Winer, Brown, and Michels (1991), 
    and many others. 
 
I suspect that proper experiments haven’t often been performed in 
education for two reasons: (a) proper experiments in education 
are difficult, and (b) perhaps simply due to tradition, research-
ers in education have lacked a strong connection to the ideas of 
experimental research.   
  
 
         Problematic Nature of Experiments in Education 
 
   < snip > 
> There are a number of reasons why experiments are problematic 
> in educational settings  
 
I fully agree.  Some key problems in performing a good education 
experiment to compare teaching approaches are 
 
1. What is the RESPONSE VARIABLE (or variables) that we will use 
   to compare the teaching approaches?  Will it be marks, or at-
   titudes, or aptitudes, or some other measure of the students 
   (or the classes of students) under study? 
 
2. What are the PREDICTOR VARIABLES that we will measure in the 
   research?  One predictor variable will reflect the different 
   teaching approaches under study.  What other predictor vari-
   ables (e.g., students’ age or gender) should we measure to as-
   sist our understanding?  Is it reasonable to use students’ 
   race as a predictor variable? 
 
3. How can we design the experiment in a way that will ELIMINATE 
   teacher effects and other REASONABLE ALTERNATIVE EXPLANATIONS 
   of any significant differences we find in the values of the 
   response variable between the teaching approaches?   

4. How can we design the experiment so that it is as LIKELY as 
   possible that we will FIND STRONG EVIDENCE of the relationship 
   we are looking for between the response variable and the pre-
   dictor variable(s) (assuming that the relationship actually 
   exists)?   
 
5. How can we RECRUIT TEACHERS to participate in the experiment? 
 
6. How can we OBTAIN FUNDING to pay for the experiment? 
 
7. After we have performed the experiment, how can we ANALYZE the 
   RESULTS and DRAW SCIENTIFICALLY VALID CONCLUSIONS? 
 
8. While properly addressing the preceding seven problems, how 
   can we MINIMIZE the COSTS of the experiment? 
 
Appendix A expands the eight problems and discusses some solu-
tions. 
 
 
              Necessity of Experiments in Education 
 
   < snip >  
> As for the necessity of experimental research to provide the 
> basis for either (a) science or (b) valid conclusions, the as- 
> tronomers say “Hi!  We’ve been engaging in science for hundreds 
> of years without having performed a single experiment!”. 
 
Mike suggests that if astronomers don’t perform experiments, then 
education researchers also don’t need to perform experiments.  I 
think that this is a thought-provoking point, but an invalid ar-
gument.  The argument is invalid because astronomers CAN’T per-
form experiments because they can’t manipulate distant astronomi-
cal events.  If they could manipulate these events (at a reason-
able cost), it seems doubtless that they would.  That is, they 
would perform proper experiments just like scientists in disci-
plines that regularly perform experiments, such as in most 
branches of physics, chemistry, engineering, medicine, biology, 
and psychology. 
 
The fact that astronomers can’t perform astronomical experiments 
makes astronomy an “observational” discipline.  Other observa-
tional disciplines that generally can’t perform experiments (due 
to the remoteness or untouchability of the phenomena they study) 
include anthropology, archaeology, economics, epidemiology, geol-
ogy, paleontology, and some areas of sociology.  Such observa-
tional disciplines base their inferences on careful observational 
empirical research, often studying relationships between vari-
ables.  (In historical disciplines, observational research is 
sometimes [due to the paucity of data] reduced to careful consid-
eration of physical or anecdotal information about entities, 
properties of entities, or variables, without focusing on the 
concept of ‘relationship between variables’.) 
 
Proper observational research often enables us to reliably 
PREDICT the values of the response variable (in new situations), 
which is an important benefit.  However, in education research we 
would generally like to learn how to reliably CONTROL the values 
of the response variable (in new situations).  That is, we would 
like to learn how to structure an education program so that it 
provides students with the BEST education.  Unfortunately, obser-
vational research is almost always equivocal about control -- 
subject to multiple competing reasonable explanations.   
 
For example, suppose we are presented with the results of an ob-
servational research project in education that suggests that a 
certain teaching approach A is better (in the sense of exhibiting 
significantly better average values in students of the chosen re-
sponse variable) than another teaching approach B.  In this case 
(due to the nature of observational research) it is almost always 
possible to find a reasonable alternative explanation of the re-
search finding, and this explanation implies that approach A may 
NOT be better than approach B.  But if we find such an explana-
tion, this implies that the research is equivocal.  This means 
that the research is of substantially less value because it can’t 
reliably help us to decide which teaching approach is better.   
 
(A reasonable alternative explanation in observational education 
research is often in terms of “confounding” of the teaching ap-
proaches under study in the research with some other aspect 
[i.e., variable] of the research situation.  Then it is generally 
possible that this other variable can fully account for the dif-
ference between the average values of the response variable under 
the different teaching approaches.  For example, an observational 
research project might confound two teaching approaches with two 
different schools -- school 1 uses teaching approach A, and 
school 2 uses teaching approach B.  In this case if we find a 
significant difference in the average values of the response 
variable between the two approaches, it is possible that the 
teaching approaches have no differential effect on the values of 
the response variable.  That is, [unless the School variable is 
appropriately (and expensively) taken account of in the design of 
the research] it is possible that a certain difference between 
the SCHOOLS caused the observed significant differences in the 
average values of the response variable [in students or classes] 
between the teaching approaches.) 
 
In contrast, suppose that a proper EXPERIMENT provides good evi-
dence that teaching approach A is better than teaching approach 
B.  In this case the finding is unequivocal.  This is because 
proper experiments are explicitly designed to eliminate confound-
ing and other reasonable alternative explanations.  Thus in this 
case we can safely (tentatively) conclude that approach A will be 
better than approach B in new situations (if the relevant condi-
tions are sufficiently similar to those of the experiment).  Thus 
proper experiments are preferred to observational research pro-
jects in education research. 
 
(The equivocation in observational research relates to drawing 
conclusions about causation.  That is, we are interested in 
whether teaching approach A CAUSES students to do better than 
teaching approach B.  Evidence about relationships between vari-
ables obtained in observational research is generally equivocal 
about causation.  In contrast, evidence about relationships ob-
tained in proper experimental research is unequivocal about cau-
sation.) 
 
 
       Changing Attitudes Toward Experiments in Education 
 
Mike noted that members of the respected American Educational Re-
search Association (AERA) have carefully considered the issue of 
experimental research in education.  Mike’s point is directly re-
flected in the opening sentence of the official description of 
the theme of the 2006 AERA annual meeting: 
 
    Current social and political pressures on education re-
    search suggest that research must meet the demands of 
    evidence-based and scientifically based inquiry (Ladson-
    Billings and Tate 2006).   
 
The idea of “current” pressures reflects the fact that the pres-
sures on education research are new, having arrived over the last 
decade or so.  The sentence implies that education researchers 
are moving toward “evidence-based and scientifically based” re-
search.  This suggests that education researchers should gener-
ally perform proper experiments (because observational research 
results are generally equivocal).   
 
(Having acknowledged the importance of proper research, the de-
scription of the theme of the 2006 AERA meeting turns to the 
theme itself, which pertains to education research in the public 
interest, education research that will “increase the common-
wealth”.  The discussion of the theme is available at 
http://www.aera.net/annualmeeting/?id=694 ) 
 
 
           Opportunities for Experiments in Education 
 
The preceding discussion suggests that (a) the area of experimen-
tal studies in education is only now beginning to open up and (b) 
this area will become the mainstream of education research as 
granting agencies and journal editors reinforce the point that 
proper experiments are preferred to observational research.  Be-
cause the area is opening up, it has many opportunities for 
thoughtful researchers.   
 
To perform a proper education experiment a researcher must be fa-
miliar with the principles of experimental design, power analy-
sis, and (often) repeated measurements analysis of variance.  
Some education researchers are less familiar with these topics.  
They may find it helpful to follow the path of many medical re-
searchers who collaborate with a statistician with experience in 
the topics.  To ensure that the research design is efficient, I 
recommend that this collaboration begin early in the design phase 
of the research. 
 
Opportunities exist for statisticians to present courses to edu-
cation researchers about the statistical and scientific aspects 
of education research.  I propose topics for such a course in ap-
pendix B. 
 
I believe that the movement toward experimentally based education 
research will yield a body of reliable research results that will 
substantially improve education.   
 
Don Macnaughton 
 
Donald B. Macnaughton 
donmac@matstat.com 
 
 
                        Appendices 
 
Appendix A:  Eight Problems in Experiments in Education 
 
Appendix B:  Courses About Experiments For Education Researchers 
 
Appendix C:  Can Human Performance or Behavior Be Predicted from 
             a Person’s Race?  
 
Appendix D:  Specifying a Repeated Measurements Analysis of 
             Variance  

        
     Appendix A:  Eight Problems in Experiments in Education 
 
The body of this post lists eight problems that arise in experi-
ments in education.  This appendix briefly expands the problems 
and discusses some general solutions. 

        
Problem 1:  Choosing the Response Variable 
 
Suppose we are designing an education experiment to compare two 
teaching approaches.  In choosing the response variable we can 
reasonably begin by answering the following question: 
 
    What would we like the teaching approaches under study to 
    do (accomplish)?  That is, what is our teaching goal? 
 
This question can be answered at a high level by deciding which 
of the following is the main goal: 
 
1. maximize student knowledge and understanding of the subject 
   area 
 
2. maximize certain student attitudes toward the subject area 
 
3. maximize student aptitudes in the use of the subject area in 
   practical applications 
 
4. optimize some other property (or combination of properties) of 
   the students. 
 
The definition of the teaching goal in an education experiment 
depends on the particular topic or discipline being taught, on 
the type of students being taught, and on the course designer’s 
and researcher’s interests.  The definition of the goal deserves 
careful attention because it lies at the heart of the research.   
 
After we have defined the teaching goal, we can choose the re-
sponse variable by addressing a second important question, which 
is 
 
    How can we best measure the effectiveness of a teaching 
    approach to satisfy the teaching goal? 
  
The answer to this question defines the response variable.  For 
example, if the goal is to maximize students’ knowledge and un-
derstanding, the main response variable in an education experi-
ment will be a measure of each student’s knowledge and under-
standing, typically a weighted average of marks on assignments or 
tests.  In this case it is important to devise a “fair” measure 
of students’ knowledge and understanding by devising “fair” as-
signments or tests -- a challenging but doable task. 
 
Similarly, if the teaching goal is to maximize certain student 
attitudes toward the subject area of the course, the response 
variable(s) will be one or more measures of students’ attitudes.  
Attitudes are important because they play a key role in peoples’ 
decisions.  It is generally unnecessary to devise a test of atti-
tudes toward the subject area of a course because reliable stan-
dardized attitude tests (which can be administered in less than 
twenty minutes) are available for many subject areas. 
 
(You can find tests of students’ attitudes toward a subject area 
by searching the Web of Science [available in some university li-
braries], other science literature databases, and the world wide 
web for articles and books that have the word “attitude” [or “at-
titudes”] and the name of your subject area in their titles or 
keywords.  You may also be able to find acceptable generic atti-
tude tests.  If you are studying a subject area in which an ac-
ceptable test of student attitudes toward the area is not avail-
able, and if you are comfortable with statistical ideas, you can 
study other attitude tests and then use the principles of atti-
tude scale development [Aiken 2002, Krosnick, Judd, and 
Wittenbrink 2005] and the statistical procedure of exploratory 
factor analysis [Thompson 2004] to develop such a test.) 
 
Similarly, if the teaching goal is to maximize students’ apti-
tudes (which is a key goal in courses that teach applied skills), 
the response variable will be a measure of students’ aptitudes.  
Standardized measures of aptitudes are available in some areas, 
and can be found with the methods in the first sentence of the 
preceding paragraph.   

       
Problem 2:  Choosing the Predictor Variables 
 
Choosing the predictor variables in an education experiment to 
compare teaching approaches requires first that we choose the two 
(or more) teaching approaches that we wish to compare.  These 
teaching approaches define the values of the “teaching approach” 
predictor variable.  This variable is manipulated in the students 
in the sense that some students receive one of the teaching ap-
proaches, and other students receive the other.  Two reasonable 
teaching approaches to compare are (a) the traditional approach 
to teaching some topic or discipline and (b) the top contender to 
replace the traditional approach.  The experiment stages a fair 
contest between the two approaches.  
 
As noted above, in addition to choosing the main predictor vari-
able, we must also decide which other variables of the situation 
under study we will measure.  Generally, the more “relevant” pre-
dictor variables we measure in an experiment, the better the un-
derstanding we obtain of the relationship between variables we 
are studying.  Thus it may be useful to measure each student’s 
age and gender.  It may also be relevant to measure students’ 
previous experience with the material, intelligence, socioeco-
nomic status, and years of experience speaking the language that 
the course is taught in.   
 
Appendix C discusses the use of “race” as a predictor variable in 
research studying measures of human performance or behavior. 
 
      
Problem 3:  Eliminating Reasonable Alternative Explanations 
 
The need to eliminate reasonable alternative explanations of a 
research finding stems from the sensible principle that good re-
search must be unequivocal.  Eliminating reasonable alternative 
explanations is difficult because many forms of reasonable expla-
nations are possible, and some are hard to recognize.  Thus ex-
perienced researchers spend considerable time trying to think of 
reasonable alternative explanations of research results, espe-
cially results of their own planned research.  If the elimination 
of reasonable alternative explanations is properly done (gener-
ally through careful research design), it (mostly) eliminates the 
possibility that the associated research conclusion will be ruled 
invalid due to a reasonable alternative explanation that someone 
thinks of later. 
 
Confounding alternative explanations are eliminated by random as-
signment of experimental entities to treatments.  That is, in an 
education experiment we randomly assign students (or classes of 
students) to teaching approaches.  Such random assignment helps 
to ensure (in a probabilistic sense) that the different teaching 
approaches could not be confounded with schools or with many 
other variables. 
 
For logistical reasons, random assignment is sometimes difficult 
in education research.  In the case of randomly assigning stu-
dents to different teaching approaches, the problem can often be 
solved (albeit at some expense) by running both (or all) the 
teaching approaches together at the same time on the same day of 
the week in nearby (similar) facilities.  (If enough resources 
exist, this concurrent presentation of the set of treatments can 
be repeated on different days of the week or at different loca-
tions.)  This means that students can be readily randomly as-
signed to the treatments (teaching approaches) without confound-
ing because none of the treatments has extraneous time or loca-
tion advantages.  I recommend that the researcher (a) discourage 
students from switching between classes, but permit such 
switches, and (b) identity any students who switch, perhaps to be 
with a friend.  The students who switch classes after the course 
has begun can be studied and then (due to unrepresentativeness) 
be omitted from the analysis. 
 
 
Problems 4, 7, and 8: Power, Analysis, and Minimizing Costs 
 
The fourth, seventh, and eighth problems respectively pertain to 
maximizing the power of the statistical tests in an experiment, 
analyzing the data obtained in the experiment, and minimizing the 
costs of the experiment.  Detailed technical help with these 
problems is available from the field of statistics, which has ef-
ficient general methods for the design and analysis of powerful 
but inexpensive experiments.  I discuss an introduction to the 
methods of statistics in the paper (2002). 
 
One can get practical help with problems 4, 7, and 8 by studying 
the medical research techniques of multicenter clinical trials.  
These techniques can help because proper experiments in education 
must be performed in parallel at several teaching institutions 
(or at least in several classes) to eliminate teacher effects and 
to provide power and generality.  This is similar to how medical 
research is performed in parallel in several hospitals in a mul-
ticenter clinical trial. 
 
(Technical Aside:  In medical research the patient is often the 
entity [unit] of analysis, but in education research the class of 
students is often the [implicit] entity of analysis.  A class of 
students in education research is analogous to a hospital [or 
other grouping] of patients in a multicenter clinical trial.  A 
researcher can often substantially increase the power of statis-
tical tests in education research [with often only a minor in-
crease in costs] by designing the research so that the student 
[instead of the class] is the entity of analysis.  This can be 
done by using a pre-post [i.e., repeated measurements] experimen-
tal design.  That is, the value of the response variable is meas-
ured in each student in the experiment both before and after the 
students experience their assigned teaching approach.  Such a de-
sign is feasible with many but not all response variables.) 
  
 
Problem 5: Recruiting Teachers 
 
Recruiting teachers or teaching departments to participate in ex-
perimental research in education is facilitated if the researcher 
explains how the research will provide important educational 
benefits.  If this is carefully done, and if the disruption of 
the existing program is not too great, appropriate participants 
can generally be recruited, just as appropriate medical personnel 
at different hospitals are recruited as a first step in a multi-
center clinical trial. 
 
 
Problem 6: Obtaining Funding  
 
To obtain research funding a researcher submits a carefully writ-
ten grant proposal to an appropriate funding agency.  The pro-
posal competes with other proposals for funds from the pool of 
funds distributed by the agency.  Research proposals are judged 
on the following criteria: 
 
- the reasonableness of the hypothesized phenomenon that the re-
  search will study 
 
- the clarity of thinking in the rationale and implications of 
  the research  
 
- the potential of the hypothesized phenomenon to make a worth-
  while contribution to the field under study, and 
 
- the conformity of the proposal to the correct style and proto-
  col for grant proposals to the agency. 
 
The research projects whose proposals best satisfy the above cri-
teria generally receive funding. 
 
 
              Appendix B: Courses About Experiments 
                    For Education Researchers 
 
I recommend that courses about experimental research for educa-
tion researchers discuss the eight problems discussed in appendix 
A in terms of examples of good and bad education experiments.  I 
recommend that teachers discuss real or realistic experiments (as 
opposed to abstract experiments) because realistic experiments 
enable students to consider the specific research goal of each 
example.  Considering researchers’ goals helps students to formu-
late the goals of their own research.   
 
In discussing a “good” education experiment it is important to 
convey to students the practical benefits that are provided by 
the results because these benefits generally justify the care 
taken to perform the experiment.   
 
In view of the usefulness of a pre-post experimental design for 
increasing power, I recommend that this type of design and the 
proper analysis of the results of this type of experiment be dis-
cussed in detail. 
 
Discussion of data analysis can best omit all mathematical con-
cepts and focus on interpreting the output from the computer 
analysis of the data.  Most experiments to compare teaching ap-
proaches have a continuous reasonably-well-behaved response vari-
able (e.g., marks or attitude scores), and the main predictor 
variable (i.e., “teaching approach”) is discrete.  Thus the re-
sults of these experiments are best analyzed with analysis of 
variance (which becomes repeated measurements analysis of vari-
ance if a pre-post design is used).   
 
(Repeated measurements analysis of variance is also called “mixed 
model” analysis because the right-hand side of model equation of 
the relationship between the variables contains a mixture of (a) 
“fixed” terms (associated with the predictor variables) and (b) 
“random” terms associated with unaccounted-for variation in the 
values of the response variable.  However, I prefer the term “re-
peated measurements” because it is more intuitive for beginners.) 

I recommend that discussion of data analysis be in terms of exam-
ples of good experiments in education research.  I recommend that 
the discussion cover the following topics:   
 
1. For each example, a discussion of the research hypothesis (or 
   research question) under study, a discussion of the conduct of 
   the experiment, and a discussion of the layout of the data ta-
   ble that was obtained in the experiment and that is the basis 
   of the data analysis.   
 
2. What the computer (software) must be told in order to perform 
   the analysis of variance.  We must tell it  
 
     - the location of the data table (often in a file on the 
       computer) 
      
     - which variable in the data table is the response variable 
       in the analysis 
      
     - which variable(s) in the table is (are) the predictor 
       variable(s)  
      
     - in repeated measurements experiments which of the predic-
       tor variables vary within experimental entities and which 
       vary between experimental entities (or, equivalently, 
       which variable uniquely identifies the experimental enti-
       ties) 
      
     - in more complicated experiments details of the relation-
       ship between variables we are studying, such as the hy-
       pothesized form of the model equation of the relationship.   
      
3. How to tell the computer what it must be told.  This varies 
   among different software products, and is discussed further in 
   appendix D. 
 
4. How to interpret each p-value in an analysis of variance table 
   produced by the computer from the analysis of the data. 
 
5. How to understand tables of means of the response variable for 
   effects with low p-values, as produced by the computer. 
 
6. How to graphically illustrate means of the response variable 
   for effects with low p-values for ease of understanding of the 
   results. 
 
7. How to understand other output from the computer such as (a) a 
   measure of strength or “effect size” of each relationship be-
   tween variables studied in the experiment and (b) the esti-
   mates of the values of the parameters of the model equation.   
 
8. How (to avoid embarrassing errors) the researcher must confirm 
   that the underlying assumptions of a statistical analysis are 
   adequately satisfied before drawing final conclusions.  (This 
   includes (a) checking of the univariate distribution of each 
   of the variables for anomalies and (b) appropriate verifica-
   tion that the data have certain necessary properties.  The 
   specific properties depend on which statistical procedure is 
   used to study the relationship.)  These assumptions are often 
   but not always adequately satisfied.  
 
9. How to interpret the analysis in terms of the research hy-
   pothesis.   
 
Some statistics courses omit or minimize the first and last top-
ics, which pertain to the research hypothesis.  This omission oc-
curs because some courses are more focused either on data analy-
sis or on statistical theory, which are both vast topics.  How-
ever, consideration of the implications of the research for the 
research hypothesis is clearly important in courses aimed at edu-
cation researchers.   
 
In high-level terms, the results of a data analysis have one of 
three implications for the research hypothesis, which are: 
 
1. the results support the research hypothesis (and there is no 
   reasonable alternative explanation of the results) 
 
2. the results neither support nor refute the research hypothesis 
   (either because the research found no good evidence of the 
   sought-after relationship between variables or because a rea-
   sonable alternative explanation of the results is available) 
 
3. the results refute (contradict) the research hypothesis. 
 
Because of the positive way that research hypotheses are framed 
(e.g., drug D reduces cancer), researchers performing a research 
project almost always hope that the first outcome will occur.  
That is, they hope that the results of the research project will 
support their research hypothesis.  In this case, if the hypothe-
sis was carefully thought out, the finding will make a contribu-
tion (perhaps substantial) to the associated field of study.  Un-
fortunately, the second outcome sometimes occurs, perhaps because 
the research hypothesis is false, or because the research was 
poorly designed, or due to the whims of chance.  Although the 
third outcome is generally possible, it rarely occurs in prac-
tice. 
  
After students understand the basic ideas of drawing conclusions 
from data analysis, I recommend that they learn how to use the 
computer to generate realistic artificial data.  Such data gen-
eration is not difficult if students are given the appropriate 
instructions (or program templates) and if they are shown how to 
generalize the instructions as necessary.   
 
The ability to generate realistic artificial data has three bene-
fits:  First, generating artificial data helps students to under-
stand the postulated model equation of the relationship between 
the variables that is under study.  This understanding comes be-
cause generating data is most easily understood through writing a 
simple computer program that substitutes the values of the pre-
dictor variables into the model equation and then evaluates the 
equation to generate the predicted value of the response vari-
able.  (This evaluation is done repeatedly through the use of 
program loops and with one or more random number generators to 
generate the values of the random term[s] in the equation.)  From 
a theoretical perspective the model equation is the essence of a 
relationship between variables, and the students’ experience in 
programming the essence helps them to understand it. 
 
Second, the ability to generate artificial data gives students a 
source of “tame” data, which they can then analyze with data 
analysis procedures.  Because the students generate the data, 
they know exactly what relationship(s) is (are) present (or not) 
in the data.  This allows them to see how the data analysis pro-
cedures work to detect and characterize relationships between 
variables.  This helps students to develop knowledge and trust of 
the procedures.   
 
Third, students can be encouraged to use their ability to gener-
ate artificial data when designing their own research projects.  
That is, during the design phase of a research project students 
can generate sets of realistic artificial data that resemble the 
data they expect to obtain in the research.  Then they can care-
fully analyze these artificial data with the planned data analy-
sis procedure.  This gives students a thorough review of the 
planned research design and data analysis procedure before the 
design and analysis are put into practice.  Such a review is of-
ten helpful in eliminating later serious problems and in increas-
ing efficiency, especially for beginning researchers.   
 
(Statistical software vendors can help users to generate artifi-
cial data by providing easy-to-follow instructions for generating 
data for all the common types of relationships between variables.  
I recommend that these instructions be placed prominently in the 
software documentation so that beginners can easily find this im-
portant resource.  A component of the software that can generate 
a data table from fill-in-the-blanks specifications might also be 
helpful for beginners.) 
 
I believe that the topics in this appendix, when developed in ap-
propriate detail, give prospective researchers a reasonable in-
troduction to how to perform experiments in education research.   
 
  
         Appendix C:  Can Human Performance or Behavior  
               Be Predicted from a Person’s Race? 
 
The Model Equation of a Relationship Between Performance and Pre-
dictors of Performance 
 
Suppose that the variable y is a particular measure of human per-
formance or human behavior.  For example, y might reflect stu-
dents’ grade point averages or it might reflect some measure of 
athletes’ ability.  Under the scientific approach we think that y 
“depends” on a number of other variables.  For example, we think 
that each student’s grade point average probably depends on the 
students’ intelligence, on their motivation, on their parents’ 
style of parenting, on the attitudes of their friends, perhaps on 
their diet, and on various other variables. 
 
The relationship between y and the other variables can be written 
in a general model equation as 
 
                   y = f(x1, x2, ..., xn) + ε. 

The performance measure, y (e.g., grade point average), is the 
response variable in the relationship, and the x1, x2, ..., xn 
are the relevant predictor variables (e.g., intelligence, motiva-
tion, etc.).  The notation f(...) stands for a mathematical func-
tion that outputs the estimated value of y for a person when the 
values of the x’s for the person are substituted into it.  [The 
detailed mathematical form of f(...) is discovered through analy-
sis of relevant empirical research data.]  The symbol n indicates 
the number of predictor variables under consideration, which is 
typically between one and five. 
 
The Greek letter ε (epsilon) on the right end of the equation is 
the “error” term.  It takes account of the fact that f(...) gen-
erally can’t perfectly predict the actual measured value of y for 
a person from the x’s.  The error term is a “random” variable be-
cause it has a different seemingly random value each time the 
equation is applied.  Usually ε is sensibly modeled as being half 
the time greater than zero and half the time symmetrically less 
than zero, so it has an average value of zero.  Invariably ε is 
modeled as being more often closer to zero than farther away. 
 
(Technical Aside:  In any particular [standard] instance of a 
variable [and at a given time] the variable [just like the prop-
erty behind the variable] has a single value.  [For simplicity, I 
ignore here (a) the idea that a particular value of a variable 
may be “missing” and (b) the more general but infrequent case of 
variables that are vectors.]  A variable can be classified as be-
ing either a continuous variable or a discrete variable.  If a 
variable is a continuous variable, its value in a particular in-
stance can theoretically be any value between the minimum and 
maximum permissible values.  Continuous variables almost always 
have numeric values.  For example, grade point average is a con-
tinuous variable that for a given student [in some schools] can 
have any value between 0.00 and 4.00, such as 3.82.  The values 
of variables that are obtained from conventional measuring in-
struments [of any type, e.g., ruler, stopwatch] are usually con-
tinuous variables.  [The values of any continuous variable are 
limited to a certain maximum number of significant digits (often 
between two and four) due to limitations in the accuracy of the 
measuring instrument that is used to measure the values.]  In 
contrast, if a variable is a discrete variable, its value in a 
particular instance can be one of only a limited number of dif-
ferent values, usually less than thirty, and sometimes as few as 
only two.  [Discrete variables can be ordinal -- with an implicit 
ordering -- or categorical.]  For example, the variable “likes to 
dance” is reasonably viewed as a [ordinal] discrete variable that 
for a given person has one of a range of five [or perhaps seven] 
possible values indicating different levels of liking or dislik-
ing to dance.  [The limitation to five or seven values generally 
occurs in raw variables that reflect human judgments or opinions, 
as discussed by Miller, 1956.]  For simplicity, the discussion in 
this appendix assumes that a continuous response variable is al-
ways used in research projects because using continuous response 
variables is [when feasible] the more efficient and more common 
approach.  If the response variable in a particular empirical re-
search project is discrete, some of the technical ideas behind 
model equations of relationships between variables change, but 
the main principles in this appendix still apply, as shown by the 
theory of generalized linear models [McCullagh and Nelder, 
1989].) 
 
We (i.e., society) can use the model equation of a (properly 
verified) relationship between variables to help us to predict 
and sometimes control the values of the response variable in a 
new situation on the basis of measuring or controlling the values 
of the predictor variables in the situation.  If the variables 
are carefully chosen, the ability to predict or control can be of 
substantial value.  For example, if we can find a (ethical) 
method to control (i.e., raise) grade point averages in students 
by controlling the values of other variables, we can use the 
method to help students to excel.   
 
As research into a relationship between variables advances, more 
predictor variables may be discovered that can be (correctly) in-
cluded in the function f(...) for the relationship, which makes 
the predictions (or control) made by the function more accurate.  
As the predictions become more accurate, the average (absolute) 
size of the error term ε in the model equation for the relation-
ship becomes accordingly smaller.  In some areas of research 
(e.g., in many areas of the hard sciences) the model equations 
make almost perfect predictions.  Thus the error terms in these 
equations are so small that they are often sensibly ignored. 
 
 
The Concept of ‘Race’ 
 
The discussion below uses the concept of ‘race’.  Most people 
above the age of ten or so have a reasonable intuitive under-
standing of this concept in the sense that they can reliably 
(though not perfectly) assign themselves and other people to ra-
cial categories.  (The assignments are “reliable” in the sense 
that different people generally agree with each other [at a mutu-
ally acceptable level of classification] about the assignments.)   
 
Although most people understand the concept of ‘race’ at an in-
tuitive level, formal definitions of the concept are difficult.  
The definitions break into three classes, which are  
 
1. definitions in terms of a person’s biological ancestry (e.g., 
   in terms of classification of the person’s genetic DNA) 
 
2. definitions in terms of a person’s self-reported race 
 
3. definitions in terms of a person’s observable attributes such 
   as skin color, hair color, facial characteristics, and speech 
   characteristics. 
 
Each of the three classes contains various definitions of race.  
Each definition provides (at least in theory) a way of assigning 
people to racial categories.  The categories are usually discrete 
categories (e.g., Asian, Black, Mixed, Native American, White, 
Other) rather than reflecting one or more continuous scales.   
 
For example, using the self-report approach a researcher might 
ask each person studied in a research project which of the above 
six racial categories they belong to.  Thus race would be defined 
and measured in terms of the six categories.  A second researcher 
might define race in terms of eight or ten or even more catego-
ries reflecting the many identifiable groups of people in the 
world. 
 
The various definitions of race are closely associated, but are 
different because assignments to racial categories by one of the 
definitions will sometimes disagree with assignments by another.  
For example, a person may ancestrally belong in whole or in large 
part to one race, but may report or appear as belonging to an-
other.   
 
Definitions of race in the first class (biological ancestry) are 
generally preferred to definitions in the other two classes be-
cause the first class seems basic, and the other two classes seem 
to be merely less accurate reflections of it.  However, defini-
tions in the first (and third) class are often difficult to im-
plement in practice in research.  (Some of the difficulties arise 
from respected ethical considerations.)  Thus if a research pro-
ject is performed on people in which each person’s race is meas-
ured, the researcher will often measure race in terms of self-
reported race.   

       
Is Performance “Causally Dependent” on Race? 
 
Suppose that a research project is carried out to study the rela-
tionship between (a) a measure of human performance (or behavior) 
as the response variable and (b) a set of other variables that 
are predictor variables.  The predictor variables may reflect a 
person’s attributes and may reflect manipulations applied to the 
person in an experiment.  Suppose the researcher includes in the 
research a predictor variable that reflects race (or ethnicity).  
And suppose that the research project finds good evidence of a 
relationship between performance and race -- the average level of 
performance of people from one race is significantly higher than 
the average level of performance of people from another race.  
This raises a key question:  Can we conclude from this relation-
ship between variables that human performance depends to some ex-
tent on one’s race?  In other words, can we conclude that differ-
ences in race cause differences in performance? 
 
For example, evidence exists that Asians score somewhat higher 
(on average) on intelligence tests than Whites, who in turn score 
somewhat higher (on average) than Blacks.  Does this imply that a 
person’s intelligence depends (partly) on their race? 
 
No.  The evidence of the relationship doesn’t imply dependence or 
a causal relationship because this relationship between variables 
is invariably studied with observational research, as opposed to 
proper experimental research.  Observational research must be 
used because it is impossible in a practical sense to manipulate 
“race” in a proper experiment.  That is, unlike assigning treat-
ments to people (or people to treatments) in an experiment, a re-
searcher can’t assign races to people (or people to races) be-
cause race has already been assigned.  Because research projects 
studying the relationship between performance and race are in-
variably (in that aspect) observational research projects, the 
results of the research are open to reasonable alternative expla-
nations, as discussed in problem 3 in appendix A.   
 
For example, the relationship between intelligence test scores 
and race was found through observational research.  This rela-
tionship could easily be accounted for by other causal variables 
that are confounded with race and that have (unfortunately) been 
omitted from the analysis (typically because they are unknown or 
are deemed unimportant).  For example, due to a history of op-
pression that began with Whites’ slavery of Blacks, many Blacks 
throughout the world have had less access to opportunities and 
resources, which might account for the differences in average in-
telligence test scores between Blacks and Whites.  Thus if vari-
ables that properly reflect the relevant types of oppression (in-
cluding relevant historical effects) are included in the analy-
sis, the Black/White aspect of the relationship between intelli-
gence test scores and race might easily vanish.  Similarly, 
Asians might score higher (on average) on intelligence tests than 
Blacks and Whites due to cultural childhood influences among 
Asians that emphasize disciplined logical thinking.  Thus if a 
variable reflecting childhood encouragement of logical thinking 
is included in the analysis, this second aspect of the relation-
ship between intelligence test scores and race might also vanish. 
 
To minimize expensive errors, science demands unequivocal evi-
dence of causation before causation can be inferred.  But, as 
noted, the results of research projects studying the relationship 
between performance and race are generally equivocal because they 
are open to reasonable alternative explanations.  Therefore, it 
is generally scientifically impossible to infer that performance 
in humans is causally dependent on a person’s race. 
 
The word “generally” in the preceding paragraph indicates the 
possibility of exceptions.  That is, it is conceivable that an 
ingenious researcher might find a way to perform a proper experi-
ment or find a way to deal with all of the confounding variables 
and all of the alternative explanations.  This researcher might 
still find good evidence of a causal relationship between a cer-
tain response variable reflecting performance and a person’s 
race.  Then we could conclude that a causal relationship exists 
between performance and race.  However, the chance of this occur-
ring is low because 
 
1. Eliminating all reasonable alternative explanations would be 
   very difficult or impossible. 
 
2. Knowing that performance depends to a small extent on race is 
   not of much obvious theoretical or practical use.  Therefore, 
   there is little scientific incentive to study this type of re-
   lationship.  (In contrast, knowing that certain other response 
   variables depend on race is sometimes quite useful, such as in 
   the prevention and treatment of diseases.) 
 
3. Knowing that performance depends on race may suggest to some 
   people a basis for racial discrimination.  Discrimination is 
   undesirable because it usually harms everyone involved.  
   Therefore, there is a social disincentive to study this type 
   of relationship. 


Can Performance Be Predicted in Individuals on the Basis of Their 
Race? 
 
Although it may be true that performance doesn’t depend on race, 
it is still true that certain relationships exist between per-
formance and race.  For example, as noted, a relationship exists 
between intelligence test scores and race.  Thus perhaps race 
could be used to predict (and thereby indirectly control) per-
formance, even though a direct causal relationship between these 
variables may not exist.  For example, Asians perform better (on 
average) on intelligence tests than Blacks or Whites.  Therefore, 
a company president wishing to maximize the intelligence of the 
employees of the company might decide to hire only Asians as em-
ployees.  Although such an approach to hiring is unethical, it is 
instructive to temporarily ignore the ethics and consider it from 
a strictly scientific point of view.  Is it scientifically sensi-
ble to hire on the basis of a known relationship between some im-
portant measure of performance and race?   
 
No.  It is generally inefficient to predict individual human per-
formance from race because other predictor variables are substan-
tially more accurate.  In particular, instead of using race, a 
company will hire more effective employees if it bases its hiring 
decisions on each job candidate’s education and experience, to-
gether with the candidate’s performance in an interview, and per-
haps the candidate’s performance on empirically valid aptitude 
tests.  This approach to hiring selects more effective employees 
because any differences in performance between the races are (if 
present at all) very small when compared to the vast (and identi-
fiable) differences in performance that occur within each racial 
group.   
 
(Perhaps an employer could properly use education, experience, 
interview performance, and aptitude test scores in a model equa-
tion to predict job performance, but still reasonably include a 
predictor variable reflecting race in the equation.  Perhaps in-
cluding race would significantly improve the predictions made by 
the equation, even though all the other variables are also prop-
erly used in the equation.  That is, including the other vari-
ables in the relationship might make the relationship between 
performance and race more “visible” in the analysis.  This is 
theoretically possible if a certain “interactive” type of rela-
tionship between variables occurs.  However, the presently by far 
more common outcome in social research when more predictor vari-
ables are added is that the individual predictor variables become 
weaker rather than stronger due to relationships [confoundings] 
among them.  Since race is already only at best a very weak pre-
dictor of performance, it too is likely to become weaker or non-
existent in a model equation as the number of predictor variables 
is increased.) 
 
Therefore, even if ethical considerations are ignored, it is gen-
erally not scientifically reasonable to predict performance in 
individuals on the basis of a known relationship between perform-
ance and race.   


The Social Taboo Against Concluding That Performance Depends On 
or Is Predictable From Race  
 
Despite the preceding points, there is a tendency among some peo-
ple to think that one race (or religion) is superior to others in 
one or more areas of performance.  Unfortunately, this point of 
view can lead to appalling undeserved human suffering.  There-
fore, civilized society uses another important incentive to work 
in concert with the somewhat complicated logical arguments in the 
preceding paragraphs that performance can’t be reasonably viewed 
as depending on (or as predictable from) race.  This incentive 
operates in the ethical realm and exists in the form of a strong 
social taboo against concluding that performance depends on race.  
This taboo exists without any need for justification in the sense 
that many people accept it on an intuitive level without ques-
tioning it (because it is fair).   
 
The taboo is vividly illustrated by the experience of Glayde 
Whitney, a behavior geneticist with a record of distinguished re-
search in the genetics of mouse taste, and who was the 1995 
President of the Behavior Genetics Association (BGA).  In view of 
the BGA’s name, many of its members have considered the idea of 
relationships between performance and race.  Interestingly, most 
behavior geneticists believe that no such relationships exist.  
Thus Whitney astounded the association by suggesting in his 
Presidential Address that race plays a role in causing murders.  
He presented (in a speech at an evening banquet) reliable evi-
dence that the murder rate in the United States was significantly 
higher among non-Whites than among Whites.  He then said 
 
    Like it or not, it is a reasonable scientific hypothesis 
    that some, perhaps much, of the race difference in murder 
    rate is caused by genetic differences in contributory 
    variables such as low intelligence, lack of empathy, ag-
    gressive acting out, and impulsive lack of foresight 
    (1995, p. 336). 
 
The next morning Whitney was shunned at the meeting of the BGA 
Executive Committee, and the committee voted (with Whitney ab-
staining) to issue an official statement denouncing his comments.  
Also, the editor of the BGA journal declined (contrary to stan-
dard policy) to publish the text of the Presidential Address in 
the journal (Whitney, 1995).  After the meeting the incoming 1996 
BGA president circulated an open letter calling Whitney’s com-
ments “nonscientific, misleading, and cruel,” and urging Whitney 
to resign from the association (“Specter at the Feast,” 1995). 
 
Whitney’s hypothesis is that race exerts a causal influence on 
murder, and he was correct in saying that this hypothesis is a 
“reasonable scientific hypothesis”.  However, due to the possi-
bility of reasonable alternative explanations (perhaps in terms 
of poverty and alienation), he erred in believing that the murder 
statistics properly support the hypothesis.   
 
In view of the error in scientific logic and in view of the taboo 
against concluding that performance depends on race, the members 
of the Behavior Genetics Association moved quickly to distance 
themselves from Whitney’s scientifically unfounded and socially 
inappropriate causal conclusion. 
 
(A similar taboo pertains to concluding that performance in indi-
viduals depends on their sex [gender].  We [society] allow meas-
ures of physical performance to depend on sex because sufficient 
obvious differences exist between the sexes in determinants of 
physical performance [e.g., in average body weight] to justify 
such differences.  We also generally allow differences in “emo-
tional” performance between the sexes, although the distinction 
may be diminishing.  However, we have a justified strong social 
prohibition against concluding that intellectual performance de-
pends on sex because such differences might be used by some peo-
ple as a basis for sex discrimination.) 
 
 
Does Performance Not Depend on Race? 
 
The discussion above suggests that we can’t reasonably conclude 
that performance depends on race.  It is instructive to consider 
the negation of this idea.  That is, can we conclude that per-
formance doesn’t depend on race? 
 
Many people believe that human performance doesn’t directly de-
pend on race.  (I am in this group.)  However, the statement that 
performance doesn’t depend on race is a statement of a scientific 
“null hypothesis” -- a statement that something doesn’t exist.  
(Here the null hypothesis says that no causal relationship exists 
in humans between a given measure of performance and race.)  It 
is impossible to scientifically prove that something that is 
logically possible doesn’t exist (assuming that the size of the 
thing isn’t specified).  Thus a null hypothesis can’t be directly 
empirically supported.  Thus we can’t scientifically prove that 
performance doesn’t depend (in some perhaps very small way) on 
race.   
 
Despite the preceding point, scientific logic dictates (through 
the principle of parsimony) that we assume that a null hypothesis 
is true until (if ever) incontrovertible empirical evidence to 
the contrary is brought forward.  Thus (since no incontrovertible 
evidence is presently available) we assume that performance 
doesn’t depend on race, even though we can’t prove it is true. 
 
 
Rejecting the Null Hypothesis 
 
As a rule, scientists are highly interested in properly rejecting 
null hypotheses about causal relationships between variables.  
This rejection is performed by finding empirical evidence that 
implies the existence of the relationship.  Scientists are inter-
ested in rejecting null hypotheses because the knowledge gained 
in rejecting a (carefully chosen) null hypothesis is generally of 
theoretical or practical use.   
 
However, the case of relationships between performance and race 
is an important exception.  In this case most scientists and 
other thoughtful people are not interested in trying to reject 
the null hypothesis because, as noted, rejection is not seen as 
being particularly scientifically useful, and rejection might be 
used by some people as a basis for racial discrimination.   
 
 
Summing Up 
 
The preceding discussion leads to a certain type of negative 
(null) conclusion.  A conclusion of this type is often unstated 
because experienced scientists take such a conclusion for granted 
until (if ever) it is rejected.  However, in view of the harmful-
ness of racial discrimination, the conclusion is worth stating:  
There is presently no convincing scientific evidence that per-
formance (or behavior) in individuals can be reasonably predicted 
from their race or ethnicity.   
 
 
         Appendix D: Specifying a Repeated Measurements  
                      Analysis of Variance 
                                 
The procedure for requesting a repeated measurements analysis of 
variance from a statistical analysis computer program is compli-
cated because one must understand two somewhat complicated lan-
guages: 
 
- the language of statistical ideas related to repeated measure-
  ments analysis of variance (i.e., variation, within- and be-
  tween-entity variation, main effect, interaction, and p-value) 
 
- the language of the computer program chosen to analyze the 
  data.  (In general, each program uses a different proprietary 
  language to specify the required information.) 
 
Also, requesting a repeated measurements analysis of variance is 
complicated because two layouts are available for organizing the 
data table, and most software is only capable of analyzing data 
organized according to one of the layouts, and software manuals 
sometimes don’t carefully distinguish between the layouts. 
 
One layout for organizing the data is with one row of data per 
response-variable value.  For example, suppose we perform a re-
peated measurements experiment to compare teaching approach A 
with teaching approach B using a measure of knowledge as the re-
sponse variable.  And suppose we measure the students’ knowledge 
of the subject area before they are exposed to the teaching ap-
proaches and we measure their knowledge again after each student 
has had three months of exposure to one or the other of the ap-
proaches.  Then our data table might be organized as follows: 
 
             -------------------------------------- 
                       Teaching 
             Student   Approach   Time    Knowledge 
             -------------------------------------- 
              Jack        A       Before      55 
              Jack        A       After       65 
              Mary        B       Before      63 
              Mary        B       After       75 
              Jean        A       Before      68 
              Jean        A       After       69 
              Bill        B       Before      49 
              Bill        B       After       82 
                              etc. 
              ------------------------------------- 
 
The table indicates that Jack had a measured knowledge value of 
55 before receiving teaching approach A and a measured knowledge 
value of 65 after receiving teaching approach A, and so on for 
the other students. 
 
A second layout for organizing the data is with one row of data 
per experimental entity, i.e., one row per student in the present 
discussion.  Under this layout the information in the above table 
could be organized as follows: 
 
             ------------------------------------------ 
                       Teaching   Knowledge   Knowledge 
             Student   Approach     Before      After 
             ------------------------------------------ 
              Jack        A          55          65 
              Mary        B          63          75 
              Jean        A          68          69 
              Bill        B          49          82 
                                 etc. 
             ------------------------------------------ 
 
In this second layout for organizing the data the response vari-
able (Knowledge) has multiple columns in the data table, with 
these columns reflecting the repeated measurements aspect of the 
research, with one column for each time the response variable was 
measured in the students.  Traditionally this second layout has 
been used to organize repeated measurements data.  (This may be 
because this layout is non-redundant and thus more compact than 
the first layout.)  However, the first layout may be slightly 
easier to understand because each variable has only a single col-
umn in the data table and no variables are hidden or implicit.  
(In the table immediately above the variable Knowledge has two 
columns in the table and the variable Time has no column -- time 
is implied by the two Knowledge columns.)  I hope that statisti-
cal software developers will debate the advantages of the two 
layouts for organizing repeated measurements data and then stan-
dardize on the better layout (or perhaps make both layouts avail-
able). 
 
It is easy for an expert to use statistical software to convert a 
data table from one of the two layouts for organizing repeated 
measurements data to the other.  However, for a less experienced 
researcher this conversion can be surprisingly difficult in the 
details.  Thus I recommend that less experienced researchers de-
termine which data organization their software requires and then 
ensure that the data table is organized properly from the start. 
 
 
                           References 
 
Aiken, L. R. 2002. Attitudes and related psychosocial constructs: 
   Theories, assessment, and research. Thousand Oaks, CA: Sage. 

Bailar, J. C., III, and Mosteller, F., eds. 1992. Medical uses of 
   statistics (2nd ed.). Boston: NEJM (New England Journal of 
   Medicine) Books. 
  
Box, G. E. P., Hunter, J. S., and Hunter, W. G. 2005. Statistics 
   for experimenters (2nd ed.). New York: John Wiley.  
  
Fleiss, J. L. 1986. The design and analysis of clinical experi-
   ments. New York: John Wiley.  
  
Kirk, R. E. 1995. Experimental design: Procedures for behavioral 
   sciences (3rd ed.). Pacific Grove, CA: Brooks/Cole.  
 
Krosnick, J. A., Judd, C. M., and Wittenbrink, B. 2005. The meas-
   urement of attitudes. In D. Albarracin, B. T. Johnson, and M. 
   P. Zannna (Eds.), The handbook of attitudes, (pp. 21-76). Mah-
   wah, NJ: Lawrence Erlbaum. 
 
Ladson-Billings, G., and Tate, W. 2006. 2006. American Education 
   Research Association annual meeting theme: Education research 
   in the public interest. Available at 
   http://www.aera.net/annualmeeting/?id=694 
 
Macnaughton, D. B. 2002. The introductory statistics course: The 
   entity-property-relationship approach. Available at 
   http://www.matstat.com/teach 
 
McCullagh, P., and Nelder, J. A. 1989. Generalized linear models 
   (2nd ed.). London: Chapman and Hall. 
 
Miller, G. A. 1956. The magical number seven, plus or minus two: 
   Some limits on our capacity for processing information. Psy-
   chological Review 63:81-97. Also available at 
   http://www.well.com/~smalin/miller.html 
 
Specter at the Feast. 1995 (July 7). Science 269:35. 
 
Thompson, B. 2004. Exploratory and confirmatory factor analysis: 
   Understanding concepts and applications. Washington, DC: 
   American Psychological Association. 
 
Whitney, G. 1995. Ideology and censorship in behavior genetics. 
   Mankind Quarterly 35:327-342. Also available at 
   http://www.lrainc.com/swtaboo/taboos/gw-icbg.html 
 
Winer, B. J., Brown, D. R., and Michels, K. M. 1991. Statistical 
   principles in experimental design (3rd ed.). New York: McGraw- 
   Hill.
Return to top
Home page for the Entity-Property-Relationship Approach to Introductory Statistics