Home
Subject: Experimental Research In Education: The Most Exciting Talk
at the 2005 Joint Statistical Meetings
To: EdStat E-Mail List
ApStat E-Mail List
Teaching Statistics E-Mail List
sci.stat.edu Usenet Newsgroup
From: Donald B. Macnaughton < donmac@matstat.com >
Date: Thursday August 25, 2005
Cc: Mack Shelley, Candace Schau
-----------------------------------------------------------------
For me, the most exciting talk at the 2005 Joint Statistical
Meetings in Minneapolis (August 7-11) was an invited talk given
by Mack Shelley, II of Iowa State University. The talk was enti-
tled "Education Research Meets the Gold Standard: Statistics,
Education, and Research Methods after 'No Child Left Behind'".
Here's the abstract:
This talk will inform the national debate over the role
of scientific standards for research in education, par-
ticularly as those standards are influenced by statisti-
cal methods and theory. It will bring together research-
ers in statistics and education to discuss the dramati-
cally changing context of contemporary education re-
search. Standards for acceptable research in this key
area are affected greatly by creation of the Institute of
Education Sciences in the U.S. Department of Education
and passage of the No Child Left Behind Act of 2001 and
the Education Sciences Reform Act (H.R. 3801). These re-
constituted federal support for research and dissemina-
tion of information in education, are meant to foster
"scientifically valid research," and established the
"gold standard" for education research. Greater emphasis
in education research now is placed on quantification, as
well as the use of randomized trials and the selection of
valid control groups. This talk should sustain and ex-
pand the dialogue between the statistical community and
those who implement the education research agenda.
The PowerPoint slides for the talk are at
http://www.matstat.com/teach/Shelley.ppt
Consider a definition:
An experiment (or randomized trial) is a "proper" experi-
ment if it has been performed according to generally ac-
cepted principles of scientific practice and experimental
design, as described by Bailar and Mosteller (1992), Box,
Hunter, and Hunter (2005), Fleiss (1986), Kirk (1995),
Winer, Brown, and Michels (1991), and many others.
In the past, proper experiments in education research have gener-
ally not been done, mostly because such experiments are somewhat
complicated. Instead, educators have relied on observational re-
search (and anecdotes) to support education policy decisions.
However, this approach is unreliable, as illustrated by the New
Math fiasco in the 1960's and early 1970's (Fang 1968; Kline
1973; Miller 1990; Stein 1996, chap. 12).
Proper experiments in education research have two important ad-
vantages over observational research:
- Proper experiments are unequivocal, but observational research
is invariably equivocal. Thus proper experiments in education
research (for the most part) reliably increase our knowledge of
how best to conduct an education program. (This is fully
analogous to the way that proper experiments in medicine have
greatly increased our knowledge of how to promote wellness and
fight disease.)
- Proper experiments focus attention on the important question of
what we would like education to do for us. (This focus is
through the choice of the response [outcome, dependent] vari-
able[s] for the experiment.)
Perhaps due to lack of knowledge, some education researchers con-
tinue to perform less rigorous education research, which leads to
wastage of time, wastage of opportunities, and wastage of money.
In view of this wastage, and as noted by Shelley in his talk, the
"What Works" arm of the United States Department of Education is
beginning to rate education research projects on how well they
satisfy the requirements of proper research. Each evaluated re-
search project is given a rating on a three-point scale, with the
levels being (1) Meets evidence standards, (2) Meets evidence
standards with reservations, and (3) Does not meet evidence
screens.
Researchers planning to perform education research may find it
helpful to read about the evidence standards to review or learn
what's needed for education research to be "proper". The What
Works program is described at http://www.whatworks.ed.gov
An effective way for a less experienced researcher to ensure that
their research is proper is to collaborate with another re-
searcher who is familiar with experimental design and the pit-
falls of education research.
I discuss some issues about proper research in the field of sta-
tistics education in appendices A and B and I discuss a way to
substantially increase the power of statistical tests in educa-
tion experiments in appendix C.
Don Macnaughton
-------------------------------------------------------
Donald B. Macnaughton MatStat Research Consulting Inc
donmac@matstat.com Toronto, Canada
-------------------------------------------------------
APPENDIX A: PROPER RESEARCH IN STATISTICS EDUCATION
The discussion in the body of this essay applies to all areas of
education. Of special interest to me is the area of STATISTICS
education and in particular the introductory statistics course
for students who aren't majoring in statistics. This course is
important because statistics is a cornerstone of science, and
thus proper understanding of the basic use of statistics in sci-
entific research will give students a better understanding of
science.
Unfortunately, many students fail to understand the introductory
statistics course. We know this by the experience that most sta-
tistics teachers have with non-statisticians they meet, perhaps
at a party -- many non-statisticians report that they took an in-
troductory statistics course, but were totally lost.
It seems clear that we can improve the introductory statistics
course through proper designed experiments aimed at determining
which selection of topics and which teaching approaches give stu-
dents the greatest benefits. I discuss a useful response variable
for experiments in statistics education in appendix B.
Some researchers in statistics education do not perform experi-
ments and they state that proper experimental research in statis-
tics education is premature. If asked why experimental research
is premature, these researchers give vague answers, such as say-
ing that "preliminary work" must be done. I hope that research-
ers who believe that proper experimental research in statistics
education is premature will clearly spell out the steps they feel
are necessary before this very important research can begin. It
is wasteful to delay when so many students fail to understand our
field.
APPENDIX B: A USEFUL RESPONSE VARIABLE FOR EXPERIMENTS IN
STATISTICS EDUCATION
Candace Schau has developed the Survey of Attitudes Toward Sta-
tistics (SATS). This survey, which can be administered to stu-
dents in less than ten minutes, consists of 36 statements that
students rate on a seven-point scale ranging from "strongly dis-
agree" to "strongly agree". For example, the fifth statement is
"Statistics is worthless." The SATS provides six reliable scores
for a student that reflect the student's attitudes toward statis-
tics on scales that are named Value, Affect, Cognitive Compe-
tence, Difficulty, Interest, and Effort.
The Value scale is particularly important because it reflects how
highly students value the field of statistics. We might say the
"best" introductory statistics course for a group of students is
the course that most improves the students' scores on the SATS
Value scale. This is reasonable because a student's sense of the
value of the field of statistics (as instilled by a course) is
arguably more important than any statistical knowledge instilled
by the course. This is because statistical knowledge (e.g., how
to do a t-test) is generally forgotten shortly after the student
completes the final exam. But the student's valuation of our
field usually lasts a lifetime and drives his or her decisions
and remarks about the field.
Administering and scoring the SATS is easy. Therefore, I encour-
age every introductory statistics teacher to administer it to
their students before and after each statistics course they
teach. If you compute the average difference between the stu-
dents' "before" and "after" scores, you can determine whether the
course tends to make students' attitudes better or worse, and by
roughly how much.
The SATS is available at
http://www.unm.edu/~cschau/satshomepage.htm
(The results of the SATS can be disappointing because SATS scores
in many courses are worse at the end than at the beginning. How-
ever, for a seriously committed teacher it is useful to be aware
of this problem and its extent as a stimulus to search for im-
provements.)
(On a technical matter, the differences between the students'
"before" and "after" scores on a SATS scale are useful as a rudi-
mentary measure. However, if SATS scores are used in analysis of
variance, for complete information the raw "before" and "after"
scores for the analyzed scale should be included in the analysis
instead of merely using the differences between them.)
APPENDIX C: USING REPEATED MEASUREMENTS TO INCREASE POWER
Measuring the response variable in each student before the course
begins and again at the end of the course (as discussed in appen-
dix B) can substantially increase the power of the statistical
tests in an experiment. This is because the repeated measurement
of the response variable enables us (when other standard condi-
tions are met) to use the statistical procedure of repeated meas-
urements analysis of variance. This results in certain key sta-
tistical tests being based on "within-student" comparisons, which
generally provide substantially more powerful tests than the be-
tween-class comparisons that may otherwise be necessary.
(Despite the point in the preceding paragraph, it's usually nec-
essary to study more than two classes of students in experimental
research comparing education programs. We need more than two
classes to eliminate the possibility of reasonable alternative
explanations muddying the interpretation of the results. For ex-
ample, we must eliminate the possibility of teacher differences
accounting for significant differences in the mean values of the
response variable for the different programs being compared. We
can eliminate this possibility with multiple classes with multi-
ple teachers. And [if proper random assignment of students to
classes can't be done] we also need multiple classes to reduce
the chance of a student-class selection bias.)
REFERENCES
Bailar, J. C., III, and Mosteller, F., eds. 1992. _Medical uses
of Statistics._ 2d ed. Boston: NEJM (New England Journal of
Medicine) Books.
Box, G. E. P., Hunter, J. S., and Hunter, W. G. 2005. _Statistics
for Experimenters._ New York: John Wiley.
Fang, J. 1968. _Numbers Racket; The Aftermath of "New Math"._
Port Washington, NY: Kennikat Press.
Fleiss, J. L. 1986. _The Design and Analysis of Clinical Experi-
ments._ New York: John Wiley.
Kirk, R. E. 1995. _Experimental Design: Procedures for Behavioral
Sciences,_ (3d ed.). Pacific Grove, CA: Brooks/Cole.
Kline, M. 1973. _Why Johnny Can't Add: The Failure of the New
Math._ New York: St. Martin's Press.
Miller, J. W. 1990. Whatever Happened to New Math? _American
Heritage,_ 41(8) (Dec), 76-83.
Stein, S. K. 1996. _Strength in Numbers: Discovering the Joy and
Power of Mathematics in Everyday Life._ New York: John Wiley.
Winer, B. J., Brown, D. R., and Michels, K. M. 1991. _Statistical
Principles in Experimental Design_ (3d ed.). New York: McGraw-
Hill.
Home page for the Entity-Property-Relationship Approach to Introductory Statistics