Subject: The Most Exciting Talk at the 2003 Joint Statistical

     To: EdStat E-Mail List
         ApStat E-Mail List
         sci.stat.edu Usenet Newsgroup

   From: Donald B. Macnaughton < donmac@matstat.com >

   Date: Friday August 22, 2003

     Cc: Candace Schau



The JSM (Joint Statistical Meetings) is a conference sponsored by
five large North American statistical organizations and is held 
annually in August.  This year the JSM was in San Francisco and 
was attended by almost 6000 statisticians from around the world.

Among the many sessions at the JSM were 22 sessions that pre-
sented around 110 talks related to statistics education.  The
talks I attended always reflected teachers' dedication to helping
students to understand statistics and almost always provided use-
ful new perspectives on statistics teaching.


The most exciting talk at the JSM for me was presented by Candace
Schau (pronounced "shaw"), who discussed her test for measuring
students' attitudes toward statistics (Survey of Attitudes Toward
Statistics -- SATS).  The test is designed to be administered to
students twice -- at the beginning of a course and at the end.
The test contains twenty-eight questions (items) that reflect
four subscales.  The subscales measure properties of the students
that Schau has labeled Affect, Cognitive Competence, Value, and
Difficulty.  Students can usually complete the test in under ten


As I discuss in a 2002 paper, I believe a reasonable first goal
of an introductory statistics course is

    To give students a lasting appreciation of the vital role
    of the field of statistics in empirical research.
Under this goal, the SATS is useful because it enables us to ac-
curately measure students' appreciation of statistics (using the
Value subscale of the SATS).  It is especially useful to measure
each student's appreciation immediately before and immediately
after a course because the difference between the two scores for
a student is a precise gauge of the effect of the course on the

It seems reasonable to say

    The greater the average improvement in SATS Value sub-
    scale scores as a result of a course, the higher (in one
    reasonable sense) the quality of the course.

Thus the SATS is a useful test to help us to improve statistics

Most statisticians are familiar with the experience of revealing
their vocation to someone at a party, and having that person say
that they once took a statistics course, and it was the worst
course they ever took, or some similar negative comment.  This
supports the view that a person's attitudes toward statistics
(which can often be negative) tend to persist throughout the per-
son's lifetime.  (This point was suggested at the JSM by Sterling
Hilton.)  In contrast, much of the knowledge students learn in an
introductory statistics course is forgotten, often with a steep
forgetting curve that begins when the student finishes the final

Since attitudes toward statistics generally persist more strongly
in a person's consciousness than knowledge about statistics, and
since students' attitudes toward statistics are often negative,
this suggests that it's at least as important to study how to im-
prove students' attitudes toward statistics as it is to study how
to impart specific statistical knowledge.  This point further
suggests that the SATS is a useful test to help us to improve
statistics education.


The SATS and supporting material are available without charge
from Dr. Schau's web site (Schau 2003).  In addition, Dr. Schau's
consulting firm is available to assist with projects related to
the SATS.


A reasonable rudimentary approach to interpreting an administra-
tion of the SATS is to subtract each student's pre-test score for
each sub-scale from his or her corresponding post-test score and
then to study the univariate distribution (across students) of
each of the four differences.  (The SATS is designed so that such
subtraction is reasonable.)  We would like the mean of each of
these distributions to be (in the appropriate direction) as far
away from zero as possible.

(If we wish, we can also perform a statistical test of whether
the mean change in attitude scores is different from zero, or
different from some other fixed value.  However, from the point
of view of improving statistics education such tests are less im-
portant because it is relative differences in scores between ap-
propriately compared teaching methods that are important, not ab-
solute scores.  Relative differences in scores are best obtained
in experiments, as discussed below.)


A teacher may administer the SATS and find that students' atti-
tudes are generally worse at the end of the course than at the
beginning.  If so, what can the teacher do?

First, it seems clear that something can be done.  This is be-
cause (as I think most readers will agree) the field of statis-
tics is a vital cornerstone of science (and also of all other
types of empirical research).  Thus the field clearly merits se-
rious respect and appreciation.  Thus if students' attitudes to-
ward statistics become worse after taking a particular statistics
course, this merely reflects the fact that the teacher (like many
others) has not yet found a good approach to instilling a strong
sense of the value of our field in students.  When the teacher
finds this approach, the students' attitudes toward statistics
will improve.

A reasonable way to find the best approach to teaching an intro-
ductory statistics course is to study the literature of modern
statistics education, which contains many proposals for improving
the introductory course.  Careful implementation of the most
promising of these proposals is almost certain to improve stu-
dents' attitudes toward statistics.  Appendix A gives some entry
points to the literature.


I believe we will disentangle the many approaches to teaching the
introductory statistics course and discover how to optimize stu-
dents' attitudes toward statistics (and perhaps optimize other
important response variables) through the use of standard statis-
tical tools in empirical research, with particular emphasis on
designed experiments.  Carefully designed experiments to compare
approaches to teaching statistics appear to be the best way to
find the best approaches to help people to recognize the useful-
ness of our field.

(Appendix B discusses some technical aspects of the design and
analysis of experiments in statistics education.)

Designed experiments in statistics education will follow and ex-
pand the leadership that is currently being provided by Hilton,
Christensen, Collings, Hadfield, Schaalje, and Tolley (1999).

Don Macnaughton

Donald B. Macnaughton   MatStat Research Consulting Inc
donmac@matstat.com      Toronto, Canada


For introductory statistics teachers interested in improving
their courses, here are some links to material about statistics
education.  First is a list of some journals that specialize
wholly or partly in articles about statistics education:

- Journal of Statistics Education. This online journal is aimed
  at statistics educators.

- Statistics Education Research Journal. Aimed at improving sta-
  tistics education.

- Chance. This magazine-style journal is aimed at "everyone".

- Stats. Aimed at students.

- Teaching Statistics. Aimed at teachers of students aged up to
  19 who use statistics in their work.

- "Teachers' Corner" of The American Statistician.  This section
  of the journal publishes general articles about teaching sta-

- "Statistics Teacher Network." A newsletter aimed at statistics

The American Statistical Association has an active section on
Statistical Education that regularly makes helpful contributions
to the advancement of statistical education.  Information about
this group is available at
http://www.stat.ncsu.edu/stated/homepage.html and information
about joining is at http://www.amstat.org/membership/join.html

The International Association of Statistical Education (IASE) is
an affiliate of the International Statistical Institute.  The
IASE also regularly makes helpful contributions to the advance-
ment of statistical education, including sponsoring an important
conference on statistical education every four years.  Informa-
tion about this group is available at

The American Statistical Association has published a set of care-
fully developed formal recommendations about teaching statistics
in undergraduate major and minor programs.  These are available
at http://www.amstat.org/education/Curriculum_Guidelines.html

Here are some general books about teaching statistics:

Gordon, F., and Gordon, S. (eds.) 1992. Statistics for the Twenty-
   First Century, MAA Notes No. 26. Washington, DC: Mathematical
   Association of America.

Hawkins, A., Jolliffe, F., and Glickman, L. 1992. Teaching
   Statistical Concepts. London: Longman.

Hoaglin, D. C., and Moore, D. S. (eds.) 1992. Perspectives on
   Contemporary Statistics, MAA Notes No 21. Washington, DC:
   Mathematical Association of America.

Moore, T. L. (ed.) 2000. Teaching Statistics: Resources for
   Undergraduate Instructors, MAA Notes No. 52.  Washington, DC:
   Mathematical Association of America.

Many good introductory statistics textbooks are available.  Here
are some introductory textbooks I have studied that I like:

De Veaux, R. D., Velleman, P. F., and Bock, D. E. Intro Stats.
   Boston, MA: Pearson.

Freedman, D., Pisani, R., and Purves, R. 1998. Statistics (3rd
   ed). New York: W. W. Norton.

Moore, D. S. 2003. The Basic Practice of Statistics (3rd ed). New
   York: W. H. Freeman.

Rossman, A. J. and Chance, B. L. 2001. Workshop Statistics:
   Discovery with Data (2nd ed). Emeryville, CA: Key College

Utts, J. M. and Heckard, R. F. 2004. Mind on Statistics (2nd ed).
   Belmont, CA: Brooks/Cole/Thomson.

Watkins, A. E., Scheaffer, R. L., and Cobb, G. W. 2004.
   Statistics in Action: Understanding a World of Data.
   Emeryville, CA: Key Curriculum Press.

Finally, I have made some suggestions for improving the introduc-
tory statistics course.  Discussion is available at


This appendix discusses experiments that compare methods of
teaching an introductory statistics course.  In these experiments
the response variable is a SATS subscale score (or some other
pair of pre- and post- measures of the students) and the main
predictor variable reflects the different teaching methods that
are under study.  The goal of these experiments is to determine
whether and how attitudes depend on teaching methods.  In other
words, we would like to know which of the various teaching ap-
proaches yields the best attitudes.  Knowing this helps us to im-
prove the design of introductory statistics courses.

In these experiments the researcher can experimentally compare
any teaching methods of interest.  For example, activities can be
compared with lectures, group work can be compared with individ-
ual work, or emphasis on one set of basic statistical concepts
can be compared with emphasis on another such set.

It is useful to note that a researcher will be successful at
finding relationships between attitudes and one or more predictor
variables only to the extent that the chosen predictor variables
have enough slope on the response surface to yield detectability.
Thus the predictor variables and the experimental design in re-
search in statistics education should be chosen with care.

It is possible to use multivariate methods to simultaneously
model the relationship between all four SATS subscale scores and
one or more predictor variables.  In this case the response vari-
able is viewed as a four-component vector instead of as a single
scalar value.  However, the multivariate approach generally pro-
vides no significant advantages, is much harder to understand,
and seems to link the four subscales together too tightly.  Thus
researchers typically treat each subscale score independently of
the others in separate analyses.  Thus a reasonable approach to
the main analysis for an experiment with SATS scores is to per-
form four separate (mixed-model) analyses of variance, one for
each subscale of the SATS.

A typical statistics education experiment will study several
classes or sections of students in which teachers use one method
of teaching the course and several other classes in which teach-
ers use a second competing method.  Several classes with each
method are necessary because the teaching methods vary between
classes, but teachers (and other properties of the classes) also
generally vary between classes.  Thus if we use only two classes,
any differences we find between the methods may be actually due
to the difference between the teachers, and not due to the dif-
ferences in the methods.  Thus multiple classes are needed to
even out teacher (and other) effects.

The "Interpreting SATS Scores" section above discusses how it is
reasonable to study in isolation the univariate distribution of
the differences between pre-test and post-test SATS scores.  How-
ever, it is not reasonable to give these differences to an analy-
sis of variance or linear models computer program for analysis.
Instead, a better approach is to give the program the original
pre-test and post-test scores (as opposed to giving it only the
differences between the scores or only the post-test scores).
This allows the program to take account of the uncollapsed set of
experimental data, which generally enables a more comprehensive
and more sensitive analysis.

(One could even give the program the individual SATS item scores
for each student, as opposed to giving it only the subscale
scores.  [Each subscale score is simply the sum of the item
scores for the items associated with the subscale.]  However, the
individual item scores of a test are rarely studied in the main
analysis of an education experiment, perhaps because the item
scores for a subscale are viewed as merely reflecting somewhat
independent measures of the same property.  Thus the within-stu-
dent variation in the item scores reflects nothing more than the
measurement error in the individual items, which is generally not
of interest under the goals of the experiment.)

In the type of experiment under discussion the response variable
(e.g., one of the four subscales of the SATS) is applied to each
student twice -- at the beginning of the course and at the end.
This dual administration of the response variable implies that
the experiment is a "repeated-measurements" experiment.

The use of repeated measurements is often effective because (with
an appropriate design and analysis) the comparisons for the key
statistical tests of the experiment are made within the experi-
mental entities.  That is, in the key comparison(s) an experimen-
tal entity is compared with itself.  These within-entities com-
parisons generally exhibit substantially less random variability
than the corresponding between-entities comparisons.  This im-
plies (through a mathematical argument) that statistical tests
associated with the within-entities comparisons are generally
substantially more powerful than the tests we would obtain if the
procedure of repeated measurements were not used.

Thus consider an experiment that studies the relationship between
a SATS subscale score (measured both before and after the course)
as the response variable and a predictor variable that reflects
two (or more) teaching methods.  An important statistical test
for a difference in the effectiveness of the methods is the test
of the two-way interaction between the factors (teaching) Method
and Time (of testing).

(This interaction test is relevant because if the methods being
compared differ in their effects on students' attitudes, the pre-
to-post change in attitudes of the group of students receiving
one method of teaching will be different from the corresponding
change in the other group(s).  The Method by Time interaction
test is explicitly designed to detect such a difference.)

Examination of the layout of the type of experiment under study
implies that the Method by Time interaction test is a within-
students test.  Thus this test tends to be substantially more
powerful for detecting differences than the between-students (and
between-classes) "main-effect" Method test.  Thus although the
main-effect test is of clear interest, it is important to perform
the interaction test because the latter test may be (correctly)
statistically significant even though the former test is not.

(A general rule is that if a main-effect component in an analysis
of variance table is a within-entities component [e.g., Time in
the present example], then all interactions between this compo-
nent and other components [e.g., the Method by Time interaction
in the present example] are also within-entities components.)

The experiment under discussion has the following attributes:

- The main predictor variable (sometimes called a "factor" in
  analysis of variance) is (teaching) Method, which is a "fixed"
  effect in the model (equation).

- The predictor variable Time (of testing) is a "repeated meas-
  urements" effect in the model.  Time is also a fixed effect be-
  cause the two times of measurement are fixed (relative to the
  course) at "immediately before the course" and "immediately af-
  ter the course".

- If students are grouped in classes or sections, the predictor
  variable Class is a generally viewed as a "random" effect in
  the model.

- If the experiment spans more than one academic institution, the
  predictor variable Institution is reasonably viewed as reflect-
  ing another random effect in the model.  (An experiment that
  spans more than one institution resembles a multicenter clini-
  cal trial and thus multicenter principles may provide useful

- We might also include predictor variables (effects in the
  model) for students' gender, age, and perhaps other thought-
  likely-to-be-useful predictor variables.  (However, these pre-
  dictor variables are generally of less direct interest than the
  main predictor variable [Method] and interactions between it
  and other predictor variables.)

- The experiment contains many possible interactions between the
  various predictor variables.  Specifying how the interactions
  are to be treated in the analysis is somewhat complicated.

- The experiment will likely end up being unbalanced (e.g., with
  different numbers of classes receiving the different teaching
  methods) which is another minor complication.

The preceding points suggest that proper analysis of this type of
experiment is complicated.  Thus some statistical software pack-
ages may be unable to perform the analysis.  (SPSS and SAS have
mixed model routines that can handle the key aspects of this type
of analysis for many experimental designs.)

We include the Class factor in the analysis because it reflects a
significant aspect of the experimental situation and because it
enables us to drain away attributable variation in the values of
the response variable -- variation that can be associated with
variation in the Class predictor variable.  This generally yields
a smaller between-students residual (i.e., leftover) variation,
which yields more powerful between-students statistical tests.

Thus in the experiment under discussion some of the variation in
the values of the response variable may be associable with the
differences in the different teachers for the different classes.
Draining away this variation will make the between-students sta-
tistical test(s) more powerful.  (However, as suggested above,
the between-students tests are generally less important than the
within-students tests.)

If teachers teach more than one class, we could also include a
(random) Teacher factor in the analysis.  (If teachers only teach
one class, the Class factor takes account of teacher [main] ef-

As noted, a key statistical test in the experiment under discus-
sion is the within-students test of the two-way interaction be-
tween the factors (teaching) Method and Time (of testing).  This
suggests that we may be able to ignore the between-students Class
and (if applicable) Institution factors in the analysis, which
simplifies the analysis to a standard repeated measurements
analysis.  I recommend that researchers perform the analysis both
ignoring and taking account of the Class factor and report both
results to help others understand and use the methods.  (In cer-
tain [perhaps many or all] cases the p-value is exactly the same
for the Method by Time interaction whether the Class factor is
included in the model or not.)

The repeated-measurements aspect and the random Class factor as-
pect of the analysis yield a somewhat complex covariance struc-
ture for the values of the response variable.  This structure
must be adequately modeled in order to provide correct and most
powerful statistical tests for evidence of a difference between
the two or more teaching methods being compared in the experi-

(However, if we are unsure of the covariance structure, we can
tell the program to assume an "unstructured" covariance matrix,
which may somewhat lessen the power of the statistical tests, but
will ensure that the covariance structure is adequately modeled.
Another commonly assumed covariance structure is called "compound

Researchers interested in these analyses may find it useful to
study books about repeated measurements, longitudinal, and hier-
archical linear models (or consult with their authors) to ensure
that the optimum experimental design and analysis methods are
used.  For a relatively small cost, this approach can substan-
tially increase your chance of finding what you're looking for,
if it exists.  This approach also reduces the chance of drawing
incorrect conclusions, a pitfall of complex analyses.

A useful way to understand mixed-model analysis computer programs
is to generate realistic data under various models and to analyze
the generated data with a mixed-model program (or programs).  An
easy way to generate data is to use statistical software and a
model equation.  That is, one uses random number generators or
fixed values (as necessary) to generate appropriate values of
predictor variables and one uses a properly parameterized model
equation (with a random number generator for the error term) to
generate values of the response variable from the values of the
predictor variables.  Most general statistical software can be
programmed in minutes to generate realistic research data using
this method.  (Many software products give examples of such data
generation in their documentation.)  If we use an analysis pro-
gram to study relationships (or lack of relationships) between
variables that we ourselves have "installed" in the data, we gain
effective experience in how the program works, and we also gain
effective experience in how statistical models work.

Some students' attitudes about the field of statistics are disap-
pointingly negative.  Thus a good feature of research in improv-
ing students' attitudes is that we have substantial room for im-
provement.  If our field is really as useful as many of us like
to think, proper experimentation (using a broad range of promis-
ing teaching methods) will undoubtedly lead us to improve stu-
dents' attitudes.  This will help students to appreciate the cen-
tral role our field plays (or can play) throughout empirical re-
search.  This will substantially increase the use and overall
contribution of the field of statistics.


Hilton, S. C., Christensen, H. B., Collins, B. J., Hadfield, K.,
   Schaalje, B., and Tolley, D. 1999. "A Randomized, Controlled
   Experiment to Assess Technological Innovations in the Class-
   room on Student Outcomes:  An Overview of a Clinical Trial in
   Education," in American Statistical Association Proceedings of
   the Section on Statistical Education, pp. 209-212.  (See also
   the subsequent two articles in the same Proceedings volume.)

Macnaughton, D. B. 2002.  "The Introductory Statistics Course:
   The Entity-Property-Relationship Approach."  Available at

Schau, Candace. 2003.  "Survey of Attitudes Toward Statistics"
   Available at http://www.unm.edu/~cschau/satshomepage.htm

Return to top

Home page for the Entity-Property-Relationship Approach to Introductory Statistics