Home Subject: Experimental Research In Education: The Most Exciting Talk at the 2005 Joint Statistical Meetings To: EdStat E-Mail List ApStat E-Mail List Teaching Statistics E-Mail List sci.stat.edu Usenet Newsgroup From: Donald B. Macnaughton < donmac@matstat.com > Date: Thursday August 25, 2005 Cc: Mack Shelley, Candace Schau ----------------------------------------------------------------- For me, the most exciting talk at the 2005 Joint Statistical Meetings in Minneapolis (August 7-11) was an invited talk given by Mack Shelley, II of Iowa State University. The talk was enti- tled "Education Research Meets the Gold Standard: Statistics, Education, and Research Methods after 'No Child Left Behind'". Here's the abstract: This talk will inform the national debate over the role of scientific standards for research in education, par- ticularly as those standards are influenced by statisti- cal methods and theory. It will bring together research- ers in statistics and education to discuss the dramati- cally changing context of contemporary education re- search. Standards for acceptable research in this key area are affected greatly by creation of the Institute of Education Sciences in the U.S. Department of Education and passage of the No Child Left Behind Act of 2001 and the Education Sciences Reform Act (H.R. 3801). These re- constituted federal support for research and dissemina- tion of information in education, are meant to foster "scientifically valid research," and established the "gold standard" for education research. Greater emphasis in education research now is placed on quantification, as well as the use of randomized trials and the selection of valid control groups. This talk should sustain and ex- pand the dialogue between the statistical community and those who implement the education research agenda. The PowerPoint slides for the talk are at http://www.matstat.com/teach/Shelley.ppt Consider a definition: An experiment (or randomized trial) is a "proper" experi- ment if it has been performed according to generally ac- cepted principles of scientific practice and experimental design, as described by Bailar and Mosteller (1992), Box, Hunter, and Hunter (2005), Fleiss (1986), Kirk (1995), Winer, Brown, and Michels (1991), and many others. In the past, proper experiments in education research have gener- ally not been done, mostly because such experiments are somewhat complicated. Instead, educators have relied on observational re- search (and anecdotes) to support education policy decisions. However, this approach is unreliable, as illustrated by the New Math fiasco in the 1960's and early 1970's (Fang 1968; Kline 1973; Miller 1990; Stein 1996, chap. 12). Proper experiments in education research have two important ad- vantages over observational research: - Proper experiments are unequivocal, but observational research is invariably equivocal. Thus proper experiments in education research (for the most part) reliably increase our knowledge of how best to conduct an education program. (This is fully analogous to the way that proper experiments in medicine have greatly increased our knowledge of how to promote wellness and fight disease.) - Proper experiments focus attention on the important question of what we would like education to do for us. (This focus is through the choice of the response [outcome, dependent] vari- able[s] for the experiment.) Perhaps due to lack of knowledge, some education researchers con- tinue to perform less rigorous education research, which leads to wastage of time, wastage of opportunities, and wastage of money. In view of this wastage, and as noted by Shelley in his talk, the "What Works" arm of the United States Department of Education is beginning to rate education research projects on how well they satisfy the requirements of proper research. Each evaluated re- search project is given a rating on a three-point scale, with the levels being (1) Meets evidence standards, (2) Meets evidence standards with reservations, and (3) Does not meet evidence screens. Researchers planning to perform education research may find it helpful to read about the evidence standards to review or learn what's needed for education research to be "proper". The What Works program is described at http://www.whatworks.ed.gov An effective way for a less experienced researcher to ensure that their research is proper is to collaborate with another re- searcher who is familiar with experimental design and the pit- falls of education research. I discuss some issues about proper research in the field of sta- tistics education in appendices A and B and I discuss a way to substantially increase the power of statistical tests in educa- tion experiments in appendix C. Don Macnaughton ------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc donmac@matstat.com Toronto, Canada ------------------------------------------------------- APPENDIX A: PROPER RESEARCH IN STATISTICS EDUCATION The discussion in the body of this essay applies to all areas of education. Of special interest to me is the area of STATISTICS education and in particular the introductory statistics course for students who aren't majoring in statistics. This course is important because statistics is a cornerstone of science, and thus proper understanding of the basic use of statistics in sci- entific research will give students a better understanding of science. Unfortunately, many students fail to understand the introductory statistics course. We know this by the experience that most sta- tistics teachers have with non-statisticians they meet, perhaps at a party -- many non-statisticians report that they took an in- troductory statistics course, but were totally lost. It seems clear that we can improve the introductory statistics course through proper designed experiments aimed at determining which selection of topics and which teaching approaches give stu- dents the greatest benefits. I discuss a useful response variable for experiments in statistics education in appendix B. Some researchers in statistics education do not perform experi- ments and they state that proper experimental research in statis- tics education is premature. If asked why experimental research is premature, these researchers give vague answers, such as say- ing that "preliminary work" must be done. I hope that research- ers who believe that proper experimental research in statistics education is premature will clearly spell out the steps they feel are necessary before this very important research can begin. It is wasteful to delay when so many students fail to understand our field. APPENDIX B: A USEFUL RESPONSE VARIABLE FOR EXPERIMENTS IN STATISTICS EDUCATION Candace Schau has developed the Survey of Attitudes Toward Sta- tistics (SATS). This survey, which can be administered to stu- dents in less than ten minutes, consists of 36 statements that students rate on a seven-point scale ranging from "strongly dis- agree" to "strongly agree". For example, the fifth statement is "Statistics is worthless." The SATS provides six reliable scores for a student that reflect the student's attitudes toward statis- tics on scales that are named Value, Affect, Cognitive Compe- tence, Difficulty, Interest, and Effort. The Value scale is particularly important because it reflects how highly students value the field of statistics. We might say the "best" introductory statistics course for a group of students is the course that most improves the students' scores on the SATS Value scale. This is reasonable because a student's sense of the value of the field of statistics (as instilled by a course) is arguably more important than any statistical knowledge instilled by the course. This is because statistical knowledge (e.g., how to do a t-test) is generally forgotten shortly after the student completes the final exam. But the student's valuation of our field usually lasts a lifetime and drives his or her decisions and remarks about the field. Administering and scoring the SATS is easy. Therefore, I encour- age every introductory statistics teacher to administer it to their students before and after each statistics course they teach. If you compute the average difference between the stu- dents' "before" and "after" scores, you can determine whether the course tends to make students' attitudes better or worse, and by roughly how much. The SATS is available at http://www.unm.edu/~cschau/satshomepage.htm (The results of the SATS can be disappointing because SATS scores in many courses are worse at the end than at the beginning. How- ever, for a seriously committed teacher it is useful to be aware of this problem and its extent as a stimulus to search for im- provements.) (On a technical matter, the differences between the students' "before" and "after" scores on a SATS scale are useful as a rudi- mentary measure. However, if SATS scores are used in analysis of variance, for complete information the raw "before" and "after" scores for the analyzed scale should be included in the analysis instead of merely using the differences between them.) APPENDIX C: USING REPEATED MEASUREMENTS TO INCREASE POWER Measuring the response variable in each student before the course begins and again at the end of the course (as discussed in appen- dix B) can substantially increase the power of the statistical tests in an experiment. This is because the repeated measurement of the response variable enables us (when other standard condi- tions are met) to use the statistical procedure of repeated meas- urements analysis of variance. This results in certain key sta- tistical tests being based on "within-student" comparisons, which generally provide substantially more powerful tests than the be- tween-class comparisons that may otherwise be necessary. (Despite the point in the preceding paragraph, it's usually nec- essary to study more than two classes of students in experimental research comparing education programs. We need more than two classes to eliminate the possibility of reasonable alternative explanations muddying the interpretation of the results. For ex- ample, we must eliminate the possibility of teacher differences accounting for significant differences in the mean values of the response variable for the different programs being compared. We can eliminate this possibility with multiple classes with multi- ple teachers. And [if proper random assignment of students to classes can't be done] we also need multiple classes to reduce the chance of a student-class selection bias.) REFERENCES Bailar, J. C., III, and Mosteller, F., eds. 1992. _Medical uses of Statistics._ 2d ed. Boston: NEJM (New England Journal of Medicine) Books. Box, G. E. P., Hunter, J. S., and Hunter, W. G. 2005. _Statistics for Experimenters._ New York: John Wiley. Fang, J. 1968. _Numbers Racket; The Aftermath of "New Math"._ Port Washington, NY: Kennikat Press. Fleiss, J. L. 1986. _The Design and Analysis of Clinical Experi- ments._ New York: John Wiley. Kirk, R. E. 1995. _Experimental Design: Procedures for Behavioral Sciences,_ (3d ed.). Pacific Grove, CA: Brooks/Cole. Kline, M. 1973. _Why Johnny Can't Add: The Failure of the New Math._ New York: St. Martin's Press. Miller, J. W. 1990. Whatever Happened to New Math? _American Heritage,_ 41(8) (Dec), 76-83. Stein, S. K. 1996. _Strength in Numbers: Discovering the Joy and Power of Mathematics in Everyday Life._ New York: John Wiley. Winer, B. J., Brown, D. R., and Michels, K. M. 1991. _Statistical Principles in Experimental Design_ (3d ed.). New York: McGraw- Hill.

Home page for the Entity-Property-Relationship Approach to Introductory Statistics