Subject: Re: Eight Features of an Ideal Intro Stat Course
         (Response to comments by Bob Hayden)

     To: EdStat E-mail List and Newsgroup

   From: Donald B. Macnaughton <>

   Date: Sunday July 23, 2000
     Cc: Bob Hayden <>


Quoting a 99/5/9 post of mine, Bob Hayden writes (on 99/5/9)

> ----- Forwarded message from Donald Macnaughton -----
        < snip >
>> It is not necessary to bring formal statistical procedures
>> into the discussion to discuss relationships between vari-
>> ables.  I recommend that teachers capitalize on this fact and
>> give students a strong sense of the concept of a relationship
>> between variables before introducing ANY formal statistical
>> procedures.   
        < snip >
> ----- End of forwarded message from Donald Macnaughton -----      
> Without getting too deeply into the main issue here of univari-
> ate versus multivariate, I would like to comment on a couple of
> details.
> I think the relationship between a measurement variable and a
> categorical variable is best visualized with parallel boxplots
> -- one for each category -- on the same scale.  Indeed, such
> plots are the main reason to learn boxplots.  

Many readers will agree that plots are essential tools for under-
standing relationships between variables.  Four standard types of 
plot for illustrating the type of relationship Bob describes are

- parallel dot plot

- parallel boxplot

- graph (perhaps with standard-error-of-the-mean bars) and

- parallel stem-and-leaf plot.

To help with discussion of Bob's points, I show the same data 
plotted in each type of plot in the figure below.

Four plots of the same data.
    FIGURE CAPTION:  Four types of parallel plot, each re-
    flecting exactly the same data.  The plots are called 
    (clockwise from the top left) parallel dot plot, parallel 
    boxplot, mean graph with standard-error-of-the-mean bars, 
    and parallel stem-and-leaf plot.  As can be seen on both 
    the parallel dot plot and the parallel stem-and-leaf 
    plot, the counts of the number of values of the response 
    variable available for the three values of the predictor 
    variable are (from left to right) 25, 26, and 24.  

(Appendix A describes how to obtain a higher-resolution copy of 
the figure.) 

The figure reflects a simple empirical research project in which 
a single "discrete" predictor variable is observed at or manipu-
lated through three values in the research entities (or in the 
entities' "environment") and the values of a single "continuous" 
response variable are observed in the same entities.  

(Appendix B discusses the distinction between discrete and con-
tinuous variables.)

(I could have searched data archives to find an appropriate da-
taset on which to base the figure.  However, to save time and to 
get exactly what I wanted, I simply used the SAS normal random 
number generator to make up the values of the response variable 
in the figure.  I specified nominal means of 28, 32, and 36 for 
the three groups and a nominal standard deviation [within each 
group] of 9.)

If I were presenting the situation illustrated in the figure to 
students, I would make it very concrete, perhaps in part as fol-
lows:  The research entities are 75 AIDS patients who were ran-
domly assigned to three groups.  The predictor variable reflects 
three levels of a new drug that were (in double-blind fashion) 
administered to the three groups of patients -- a different level 
to each group.  (To increase the power of the statistical tests, 
one of the levels of the drug was "zero".)  The response variable 
is an appropriate measure of the healthiness of the patients af-
ter six weeks of treatment with the drug.  

(I would tell students that in real AIDS research only two levels 
of the drug would normally be used because, when appropriate lev-
els are chosen, this also helps increase the power of the statis-
tical tests.)

Also, if I were presenting the situation illustrated in the fig-
ure to students, I would carefully discuss the important implica-
tions of the figure for the treatment of AIDS patients, including 
how the implications are derived and the main caveats.

                            *   *   *

The four plots in the figure are quite different from each other, 
even though they all reflect exactly the same data.  What are the 
advantages and disadvantages of each type of plot?

Consider the parallel dot plot.  Dot plots (Tukey 1977, p. 50; 
Wilkinson 1999) have the advantage that they are closer to the 
raw data than the other three types of plots -- dot plots picto-
rially reflect the exact tabled values of both the response and 
predictor variables for each entity under study in the research.  
Because dot plots are close to the raw data, students find them 
easy to understand.  

Parallel dot plots can be easily drawn by any software that can 
draw scatterplots.  (If many data values are present, it is help-
ful to slightly offset the dots that lie atop one another, as 
shown in the figure.  This offsetting is unfortunately not avail-
able as a simple option in most plotting software, so the user 
must do it manually or write a program to do it semi-
automatically.  Appendix C discusses some offsetting algorithms.)

Consider the parallel boxplots in the figure and consider any one 
of the three boxplots.  To understand this boxplot a student must 
understand the notion of the quantiles of a distribution of nu-
meric values (in particular, median and quartile) and a conven-
tion that defines the length of the whiskers (Tukey 1977, pp. 39-
53).  Although these technical concepts are not complicated, they 
make boxplots harder for students to understand than dot plots.

Boxplots have the advantage over the other types of plots that 
they highlight outliers -- points that lie well away from the 
other points on the plot.  For example, note the solitary outlier 
in the upper tail of the rightmost boxplot in the figure.

Consider the graph in the lower-right quadrant of the figure.  
Graphs showing the mean (or median) values of the response vari-
able for each value of the predictor variable (possibly with 
standard-error-of-the-mean bars) are often used in reports in the 
empirical research literature and in the popular press.  Like 
boxplots, graphs showing the mean or median with error bars are 
harder for students to understand than dot plots because these 
graphs are based on technical concepts (i.e., a measure of the 
central tendency of a distribution and a measure of the spread). 

Furthermore, graphs with standard-error-of-the-mean bars hide the 
extent of the distribution because (as dictated by the formula 
for the standard error of the mean) the height of each bar is 
strongly (inversely) dependent on the number of values of the re-
sponse variable available for the given value of the predictor 

On the other hand, graphs with standard-error-of-the-mean bars 
are useful if we wish to focus on the "average" relationship be-
tween the two variables under study.  We can thus focus on a nar-
rower range of values of the response variable, as is reflected 
by the difference between the vertical axis scale on the plot in 
the lower-right quadrant of the figure and the vertical axis 
scales on the two plots in the upper half of the figure. 

Furthermore, graphs with standard-error-of-the-mean bars are use-
ful because they enable an experienced researcher to quickly per-
form a "visual t-test".  This gives one a visual confirmation of 
what takes place mathematically in the t-test.  Appendix D de-
scribes the visual t-test.  

(Standard-error-of-the-mean bars enable a visual t-test because 
the bars are scaled to reflect the number of values of the re-
sponse variable available for a given value of the predictor 
variable.  Boxplots cannot be used for visual t-tests because 
they are not so scaled.)

Both parallel boxplots and graphs have important advantages over 
parallel dot plots:  Boxplots and graphs SUMMARIZE the univariate 
distribution of the values of the response variable for a given 
value of the predictor variable.  Thus boxplots and graphs hide 
some of the detail that is present in the corresponding parallel 
dot plot.  Also, boxplots and graphs are often easier to draw and 
generally take up less horizontal space on a page than dot plots.

Although in certain situations boxplots and graphs have advan-
tages over dot plots, students should learn that before they use 
a summary plot they should study a dot plot of the raw data to 
ensure that the summary plot is not hiding some important feature 
of the distribution of the values, as illustrated by Tukey (1977, 
pp. 49-50).

Consider the parallel stem-and-leaf plot in the figure.  This 
type of plot is useful when we need to display details of the ac-
tual values of a variable (Tukey 1977, pp. 6 - 16).  On the other 
hand, when these details are not needed, this type of plot has a 
significant disadvantage:  The extra textual detail distracts the 
viewer from the overall sense of the distribution of the values.  
The overall sense is often more important than the mostly unsub-
stantial specific numerical differences that are reflected in the 
digits in the "leaves" of the plot.

Also, stem-and-leaf plots are inferior to dot plots at highlight-
ing gaps in the distribution of a set of values.  This can be 
seen by studying the gaps in the dot plot and stem-and-leaf plot 
in the figure, especially the gap for the outlier in the upper 
tail when the predictor variable is at level 3.

Appendix E discusses some other approaches to displaying the data 
in the figure.  

Because I believe dot plots are the easiest of the various types 
of plots for students to understand, I recommend that discussion 
of parallel plots in the introductory statistics course begin 
with parallel dot plots.  I recommend that this discussion be 
followed by discussion of parallel boxplots and graphs because 
the latter two types of plots are often used in reports of em-
pirical research.  

                            *   *   *

Bob's example studies a relationship between variables in which 
the response variable is continuous, but the predictor variable 
is discrete.  Bob may be suggesting that we use this type of ex-
ample as the FIRST detailed example of a relationship between 
variables in an introductory statistics course.  However, other 
types of example are also possible.  In particular, instead of 
using a discrete predictor variable we could use a continuous 
one.  Which type of relationship is best for the first detailed 
example of a relationship between variables at the beginning of 
an introductory course?

I recommend that the first detailed example of a relationship use 
response and predictor variables that are BOTH CONTINUOUS for the 
following reasons:

- To facilitate student understanding, the first example of a re-
  lationship should be as simple as possible.  This suggests us-
  ing an example of an observational research project as opposed 
  to an example of an experiment.  This is because with experi-
  ments students must understand the concept of random assignment 
  and the concept of "manipulation" of the values of a predictor 
  variable.  These concepts are not needed if we use an example 
  of an observational research project.

- It is desirable (when possible) to use continuous variables in 
  empirical research because a continuous variable almost always 
  carries more information in its values than a discrete variable 
  measuring the same property.  (An important exception is that 
  the "manipulated" variables in experiments are almost always 
  discrete because appropriately used discrete manipulated vari-
  ables provide more powerful statistical tests.)

- Many examples of observational research projects are available 
  that have both a continuous response variable and a continuous 
  predictor variable.

These points suggest that the first detailed example of study of 
a relationship between variables in an introductory course should 
be an example of an observational research project that studies 
the relationship between two continuous variables.  

I recommend the following example:  The response variable is the 
mark (say, out of 100) that each student obtained in a particular 
course of study.  The predictor variable is the total amount of 
time (in minutes) each student spent working on the course during 
the term, as tracked by student time diaries.  You can pique stu-
dent curiosity by using the data for the students in the preced-
ing term of your present course.  Appendix F discusses the logis-
tics of tracking student time spent on a course.

Studying the relationship between study-times and course-marks is 
effective because this relationship is of serious direct interest 
to most students.  Also, the example provides an easily under-
stood basis for discussing several important general concepts of 
statistics and empirical research such as measurement accuracy, 
weak relationships between variables, alternative explanations, 
the need for hypothesis testing about the presence of a relation-
ship, causation, multiple causation, observational versus experi-
mental research, and bivariate regression.

In an introductory course that follows the recommended approach 
and begins with an example with two continuous variables, the 
first graphic that students see is a scatterplot rather than a 
parallel plot.  After students understand how scatterplots illus-
trate the relationship between two continuous variables, we can 
THEN introduce the parallel dot plot as a special type of scat-
terplot that illustrates a new type of relationship between vari-
ables -- a relationship in which the predictor variable is no 
longer continuous, but is instead discrete.

                            *   *   *

Let us return to Bob's comments.  Recall that he says above that 
certain relationships between variables are best visualized with 
parallel boxplots.  He continues

> However, I see many texts that focus on the mechanics of con-
> structing a single boxplot, but then never go on to use them to
> visually compare several groups.  Perhaps this is the extreme
> in being adamantly univariate.  

I agree.

> On the other hand, I do think it is useful for students to
> learn to make boxplots without a computer, and for purposes of
> teaching this, there is an advantage in concentrating on one
> boxplot at a time.  

I agree that students can best understand boxplots if they con-
centrate on one boxplot at a time.  However, as discussed above, 
if a teacher wishes to use a discrete predictor variable in the 
first detailed example of a relationship, I recommend NOT start-
ing with boxplots, but with dot plots.  Under this approach I be-
lieve it is not necessary to begin with discussion of a dot plot 
of a single distribution.  Instead, after introducing the concept 
of a relationship between variables (which is what all the paral-
lel plots illustrate), we can immediately introduce a parallel 
dot plot to students as a useful tool for illustrating certain 

> HOWEVER, as soon as the students understand what a boxplot IS,
> you can immediately put the boxplots to good use by having a
> computer generate parallel boxplots comparing several groups. 

As noted, I agree with Bob that parallel plots (dot plots, box-
plots, or graphs) are fundamental tools for illustrating certain 
relationships between variables.  However, an issue on which Bob 
and I may disagree concerns the ORDER in which a teacher should 
introduce the ideas of

(a) relationships between variables and 

(b) parallel plots. 

For students who are not majoring in statistics or mathematics, I 
recommend introducing relationships between variables FIRST, be-
fore we introduce individual or parallel plots (or scatterplots).  
On the other hand, Bob may be recommending that we introduce re-
lationships between variables SECOND, after we have introduced 
individual (and possibly parallel) plots.

Clearly, the approach of introducing individual or parallel uni-
variate plots (or scatterplots) before we introduce relationships 
between variables has SOME appeal.  In particular, if we follow 
this approach, when the time comes in the course to illustrate a 
relationship between variables with plots the students will al-
ready be familiar with the plots.

However, as I discuss elsewhere

- almost all the commonly used statistical procedures can be rea-
  sonably viewed as procedures for studying relationships between 
  variables (1999, sec 4.3) and

- almost all formally reported empirical research projects can be 
  reasonably viewed as studying relationships between variables 
  (1999, app. B).

Thus the concept of 'relationship between variables unifies al-
most all statistical procedures and almost all empirical research 
projects.  Therefore, I recommend that teachers center the intro-
ductory statistics course on the fundamental unifying concept of 
'relationship between variables'.

I illustrate in two papers how a teacher can easily introduce the 
concept of 'relationship between variables' in an introductory 
course without having to first cover univariate plots (1996, 
1999).  The 1999 paper also discusses how concepts related to 
univariate distributions are boring for students because the con-
cepts have no obvious practical value (sec. 6.9).

In view of these points, I recommend introducing relationships 
between variables first.  However, shortly after introducing re-
lationships between variables, I recommend that teachers intro-
duce the various types of plots that help us to ILLUSTRATE rela-
tionships between variables.  Such plots are essential tools for 
understanding relationships.

                            *   *   *

In my 99/5/9 post I discuss why I believe teachers continue to 
discuss univariate distributions at the beginning of introductory 
statistics courses even though it is no longer necessary to dis-
cuss this topic.  As part of that discussion I say

>> In the past, before the arrival of good statistical computing
>> packages, a person performing a statistical analysis had to
>> understand the mathematics of statistics in order to carry out
>> the (necessarily manual) computations.  (It is almost impossi-
>> ble to perform statistical computations manually if one does
>> not properly understand them.)

Quoting this passage Bob writes

> I would have to disagree that carrying out statistical computa-
> tions "by hand" requires or demonstrates statistical under-
> standing.  It only demonstrates that the steps in the computa-
> tion have been mastered.  Computers grind out statistical com-
> putations all the time without understanding them.  Programmers
> implement statistical formulas all the time with little or no
> understanding of why anyone wants to calculate this or what it
> means.  In the days before students mindlessly pushed buttons
> on their calculators, they mindlessly pushed pencils across
> pages of paper. 

I agree with Bob that some people learn to perform statistical 
computations without understanding what they are doing -- my 
point above does not contradict this point.  My point is that in 
the days before we had good computer software to perform statis-
tical computations, if one wished to perform a responsible sta-
tistical analysis, one had to understand the underlying mathemat-
ics.  This was necessary to ensure that the computations were 
performed correctly.  

Nowadays, as Bob implies, the need for understanding is still 
very much present.  But for students who are not majoring in sta-
tistics or mathematics, it is no longer necessary to attain 
MATHEMATICAL understanding.  This is because a computer can do 
all the standard mathematical computations of statistics, and 
generally do them very well.  What students need instead of 
mathematical understanding is "conceptual" understanding.

As I discuss in the 1999 paper, I believe we can give students a 
thorough conceptual understanding of the role of the field of 
statistics by showing them that statistics helps us to study 
variables and relationships between variables as a means to accu-
rate prediction and control.  A student need not understand the 
underlying mathematics of statistics to understand these simple 

Donald B. Macnaughton   MatStat Research Consulting Inc      Toronto, Canada


A higher-resolution copy of the figure is available in Adobe 
Portable Document Format.  To view or print files stored in this 
format you can download a free reader (Adobe Acrobat) from Adobe 

To view the figure, click here.


In the body of this post I refer to the concepts of "continuous" 
and "discrete" variables.  I propose the following definition:

    A variable is a CONTINUOUS variable if and only if (1) it 
    has numeric values and (2) it is capable of assuming all 
    values within its range of allowable values.  If a vari-
    able is not a continuous variable, it is a DISCRETE vari-

As suggested by Cox (1999), no real-life variable is truly con-
tinuous according to this definition because we can always dream 
up values within the range of a variable that the variable cannot 
assume -- in particular, values with more significant digits than 
the associated measuring instrument is capable of delivering.  
Thus any given real-life "continuous" variable is generally inca-
pable of assuming all possible values within its range, but may 
only be capable of assuming several thousand different values, or 
perhaps a hundred or so different values, or perhaps only twenty 
or so different values.  

However, the breakdown of the definition is usually not a problem 
in practice because the statistical techniques for handling con-
tinuous variables do not require that the variables be "truly" 
continuous -- they generally only require that the ordering of 
the values be meaningful and the error term in the model have an 
"adequate" appearance of coming from a certain (continuous) dis-


Various ways are available to offset overlapping points on dot 
plots.  In particular, on a parallel dot plot such as the one 
shown in the upper-left quadrant of the figure, we can offset 
dots in the direction of the predictor variable, in the direction 
of the response variable, or in both directions.  Statisticians 
have suggested the following ways of offsetting dots:

- If necessary to avoid overlap, offset the dots in the direction 
  of the predictor variable in increments of one dot width, and 
  offset the dots in the direction of the response variable so as 
  to form bins, with the center of each bin being independently 
  placed to be as close as possible to the mean value of the dots 
  it contains, possibly allowing partial overlapping of dots in 
  adjacent bins, which is similar to a procedure described by 
  Wilkinson (1999).

- Offset the dots in the direction of either the response or pre-
  dictor variable (or both) with "jittering" (Chambers, 
  Cleveland, Kleiner, and Tukey 1983; Cleveland 1993) in which 
  the locations of overlapping dots are perturbed by small 
  amounts of random noise.

- Offset the dots in a systematic manner, which (according to 
  Wilkinson 1999) was originally proposed by Tukey and Tukey 

The method I used to draw the dot plot in the figure was to off-
set the dots systematically.  This method has the following ad-

- The method avoids the artificial appearance of bins on the plot 
  and instead allows the viewer to see the actual values of the 
  response variable.

- The method ensures that all the dots are completely visible on 
  the plot and thus one does not have to wonder how many dots are 
  in a clump, which may (due to the random element) occur if one 
  uses jittering.  

The algorithm I used to draw the dot plot in the figure operates 
(for each level of the predictor variable) as follows:  Add the 
dots to the plot one at a time in increasing order of the value 
of the response variable.  For each dot, keep the y-coordinate 
for the dot fixed at its correct value but, if necessary, move 
the dot in the x-direction out from its nominal position (in al-
ternating directions) small amounts (e.g., a quarter-dot-width) 
until the dot is sufficiently far away from all the previously-
placed dots.

A disadvantage of plots generated by this algorithm is that it 
can generate slight patterns in the columns of dots -- a branch-
ing upward and outward of some dots as one moves vertically up a 
column of dots.  (The branching will be downward and outward if 
one chooses the top of the plot as the arbitrary starting point 
for the placement of dots instead of the bottom.  One might also 
start at a central point of the distribution and work both up and 
down from that point.  A final perhaps best approach is to treat 
the points in a random order, since this should minimize patterns 
appearing in the dots.)

The dot plot in the figure shows the dots distributed roughly 
evenly on both sides of an imaginary vertical line.  It is also 
possible to draw dot plots with the dots distributed on only one 
side of the line, perhaps the right side, which makes the dot 
plot look more like a stem-and-leaf plot or like a histogram 
(with the long dimension of its rectangles horizontal).  I recom-
mend that programs that draw dot plots be able to draw both 

The offsetting method I used for the parallel dot plot does not 
work well for some scatterplots, since "unused" space in the 
horizontal (or vertical) direction may be unavailable on scatter-
plots.  Thus on scatterplots it often makes more sense to use 
jittering to offset overlapping dots.

I recommend that all scatterplot-drawing software have built-in 
algorithms for offsetting overlapping points on parallel dot 
plots and scatterplots.


Consider the graph in the lower-right quadrant in the figure.  
Suppose we wish to perform a t-test for a significant difference 
between two of the three group means of the values of the re-
sponse variable shown on the graph.  (This is a test for the 
presence of a relationship between the response variable and the 
predictor variable.)  It can be easily shown that if the stan-
dard-error-of-the-mean bars of two means show a "sufficient" lack 
of overlap on the graph, the t-test p-value will be less than 
.05.  This implies that (assuming no reasonable alternative ex-
planation is present) we can easily obtain good evidence of a re-
lationship between the variables by merely scanning the graph.  I 
call this approach the visual t-test.  

To illustrate the visual t-test I performed three mathematical 
two-group t-tests on the data behind the figure above.  That is, 
I performed the t-test to test the (null) hypothesis that pairs 
of means of the response variable are the same (in the popula-
tion) for pairs of values of the predictor variable.  This 
yielded the following three p-values:

                  Predictor Variable   p-value
                   Values Compared     (2-tail)
                        1 vs. 2         .0354
                        1 vs. 3         .0003
                        2 vs. 3         .1621

Note how these p-values relate to the amount of vertical overlap 
shown by the standard-error-of-the-mean bars on the graph in the 
lower-right quadrant of the figure -- the less the vertical over-
lap, the lower the p-value.

(The method I describe for performing a visual t-test is impre-
cise because the necessary amount of lack of overlap for a p-
value of, say, .05 still depends somewhat on the number of values 
of the response variable for each of the two values of the pre-
dictor variable [because these numbers determine the degrees of 
freedom for the t-statistic].  But, under certain reasonable as-
sumptions, it is easy to show that the two means must be at least 
2.77 standard errors apart for a p-value of .05.  Sall describes 
a precise method for performing visual t-tests with "comparison 
circles" [1992].)


Another method for displaying the data in the figure is with par-
allel histograms.  A parallel histogram looks like the parallel 
stem-and-leaf plot in the figure except that the rows of numbers 
are replaced by (less distracting) rectangles.  Interestingly, I 
have been unable to find examples of parallel histograms in the 
statistical literature or in various statistical software prod-
ucts I am familiar with. 

Histograms have an artificial air to them when compared to dot 
plots because the data are hidden inside the rectangles, rather 
than appearing in their raw form.  

Another method of displaying the data in the figure is with com-
parison circles which, as noted above, allow one to perform a 
precise visual t-test of the differences between the means.  Sall 
(1992) gives examples of comparison circles.

Another method of displaying the data in the figure is with a 
"diamond plot".  This plot resembles a mean graph with standard-
error-of-the-mean bars except that the bars are replaced by "dia-
monds", which are actually pairs of congruent isosceles triangles 
that share the same base.  The base of each triangle is horizon-
tal at the vertical height of its respective mean value (as re-
flected by the scale on the vertical axis).  The width of the 
base is proportional to the number of measurements that were used 
in computing the mean.  Two triangles are erected on the base -- 
an upper triangle with the apex above the base and a lower trian-
gle that is the reflection of the upper triangle on the other 
side of the base.  The heights of the triangles indicate the 
standard error of the mean, or some other measure of dispersion 
of the values.  Sall (1992) gives examples of diamond plots.

Another method of displaying the data in the figure is with a 
violin plot, in which a "density trace" is fitted to the points 
(Hintze and Nelson 1998).  This trace estimates the underlying 
distribution function of the values.  In a violin plot both the 
density trace and its mirror reflection are shown in the plot, 
making a symmetrical figure that may resemble the silhouette of 
the body of a violin with its axis vertical and with the plane of 
the body perpendicular to the line of sight of the viewer.

Violin plots are harder to understand than dot plots because stu-
dents must understand the idea of fitting a density trace to the 
data.  Also, the density trace reflects an assumption (which 
changes if we change the "tuning parameter") while the dot plot 
makes no assumptions, showing only the raw data values in an 
easy-to-understand layout.

Violin plots can be useful in cases when a large number of data 
points (i.e., greater than 30 or 40) are available for each group 
of points on the plot because then it makes more sense to fit a 
density trace to the data.  (Violin plots are effectively 
smoothed histograms that display both the assumed distribution 
and its mirror reflection.)

In situations in which it is reasonable to use a violin plot, it 
may be useful to show only half of each "violin" because showing 
a fitted distribution trace and its mirror reflection seems more 
complicated than merely showing the fitted distribution trace 
alone.  (One would not normally show a histogram and its mirror 
reflection, so why do so with a violin plot?)  Hintze and Nelson 
justify using both the density trace and its mirror reflection by 
saying that this "gives a symmetric plot which makes it easier to 
see the magnitude of the density."  I am unable to see how the 
symmetric plot makes it easier to see the magnitude -- but this 
is an aesthetic matter -- a matter of taste.  I recommend that 
programs that draw violin plots be able to draw both symmetric 
and non-symmetric plots.

All the plots I have discussed have the response variable plotted 
on the vertical axis and the predictor variable plotted on the 
horizontal axis.  All the plots could be drawn with the assign-
ment of the variables reversed -- that is, with the response 
variable plotted on the horizontal axis and with the predictor 
variable plotted on the vertical axis.  However, it is a general 
convention in statistics and empirical research that the response 
variable in a relationship between variables is shown on the ver-
tical axis of a plot because this helps viewers to rapidly orient 
themselves to the plot.

Lee and Tu (1997) discuss some other similar approaches to plot-
ting the data in the figure.


To help students study the relationship between the time they 
spend working on a course and their marks (or grades) it is nec-
essary to collect course work-time data.  To collect useable data 
one needs a reliable data-capture system and careful instruc-
tions.  To help with the data capture, a week-at-a-glance data-
capture form is available over the web in Adobe Portable Document 
Format (PDF).  

(To view or print PDF files you can download a free reader [Adobe 
Acrobat] from Adobe Systems.)
The data-capture form is available here.  

(The form works best on 8.5 x 14-inch paper.  However, you can 
check the "Fit to page" box in the Acrobat "Print" dialog to 
print the form on another size of paper if 8.5 x 14-inch printing 
is unavailable.)

To emphasize the importance of the data collection, you may wish 
to include a notice in your course description telling students 
that collection and submission of work-time data is a prerequi-
site for passing the course.  

Perhaps the best way to show students how to use the form is to 
complete a small portion of it on the board or on an overhead in 
class.  Also, written instructions for the form are available 

If you decide to use the form, I recommend that you distribute a 
fresh copy of it to students each week, even if students have 
ready access to an appropriate printer.  Weekly distribution of 
the form in class increases the chance that students will use it.  
I recommend that you collect last week's data from students at 
the beginning of each week.  You could collect the weekly data 
from students through e-mail, via a paper-based system, or over 
the web.  (I recommend against asking students to hand in their 
forms because some students will find it useful to keep the forms 
as a record of their work.)  A PDF form for a paper-based weekly 
data collection system is available here.

Some students may be tempted to misrepresent their time spent 
working on the course.  Some may be embarrassed about the low 
amount of time they are spending on the course and may thus re-
port inflated times.  Others may (mistakenly) feel that you will 
take account of their work-time in determining marks or grades so 
it will be to their advantage to report inflated times.  To in-
crease the likelihood that you find a relationship between work-
times and marks, I recommend that you discuss these issues with 
students and assure them that 

- the reported times will definitely not be taken account of in 
  assigning marks 

- a relationship may not be found if students report inaccurate 

- if a student decides to work only a small amount of time on the 
  course, this is perfectly reasonable, and not something to be 
  embarrassed about, because students are subject to many pres-
  sures that determine where they must allocate their time.  

In addition, to distance yourself from the work-time data, you 
may wish to assign everything to do with the data collection and 
analysis to a teaching assistant

Perhaps the easiest way to collect the weekly work-time data is 
to send an e-mail to each student early on Monday morning asking 
them to reply with the number of minutes they spent working on 
the statistics course in the preceding week.  It is also useful 
to ask students whether their reported number of minutes reflects 
time that was actually tracked or represents an estimate.  You 
could base the text in your e-mail on the text in the form avail-
able at the link a few paragraphs above.

In collecting the work-time data in the master data system, I 
recommend flagging values in the data that are only estimates.  
Then you can check whether differences exist in marks or differ-
ences exist in times between students who tracked their time and 
those who only estimated it.

One easy-to-handle structure for the master data system (which 
students will not have access to) is to have one record per stu-
dent, with each record containing (at least) the following 
- a student identifier 
- a field for each MARK that the student earns in the course 
  (with the fields being filled in throughout the term as the 
  marks become available from tests and assignments)
- a field for each WEEK in the course to contain the number of 
  minutes worked by the student on the course in that week (with 
  the fields being filled in throughout the term as the data be-
  come available)
- a field for each WEEK in the course containing an indicator 
  (e.g., 0 or 1) whether the value for the time in the week rep-
  resents tracked time or is only an estimate (again filled in as 
  the data become available).

Whenever you generate a new major set of student marks (e.g., for 
a midterm test or important assignment) I recommend that you gen-
erate a new data file containing one row for each student in the 
course, but with no student identifiers.  Each row will contain 
two pieces of information about the associated student:  
- the sum of the number of minutes the student has worked on the 
  course to date and 
- the mark the student obtained on the test or assignment.  

I recommend that you make this file available to students in the 
course as soon as the data are available.  Students can then gen-
erate a scatterplot to see if a relationship appears to exist be-
tween the times they spent working on the course and their marks.

Since students will likely be concerned about the use and confi-
dentiality of their marks, you may wish to assure them that the 
published marks will never be published with student names or 
other identifiers.  Thus it will generally be impossible to infer 
a student's mark from the published data.  (But if student A 
tells student B the number of minutes student A worked on the 
course, student B may then be able to infer student A's mark from 
the published data file.)

If you have comments about this time-tracking system or sugges-
tions for improvements, I would be interested to hear them.  You 
can reach me at


Cleveland, W. S. 1993. Visualizing data. Summit, NJ: Hobart 

Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. 
   1983. Graphical methods for data analysis. Boston: Duxbury 

Cox, D. R. 1999. "Variable, types of". In Encyclopedia of 
   Statistical Science, Update Volume 3 ed. by S. Kotz. New 
   York: John Wiley.

Hintze, J. L. and Nelson, R. D. 1998. "Violin plots: A box plot -
   density trace synergism," The American Statistician, 52, 

Lee, J. J. and Tu, Z. N. 1997. "A versatile one-dimensional dis-
   tribution plot: The BLiP plot," The American Statistician, 
   51, 353-358.

Macnaughton, D. B. 1996. "The entity-property-relationship ap-
   proach to statistics: An introduction for students." Available 

Macnaughton, D. B. 1999. "The introductory statistics course: The 
   entity-property-relationship approach." Available at

Sall, J. 1992. "Graphical comparison of means." American 
   Statistical Association Statistical Computing and Statistical 
   Graphics Newsletter, 3, 27-32.

Tukey, J. W. 1977. Exploratory data analysis. Reading, MA: 

Tukey, J. and Tukey, P. 1990. "Strips displaying empirical dis-
   tributions: I. Textured dot strips." Technical Memorandum, 

Wilkinson, L. 1999. "Dot plots." The American Statistician 53, 

The URL of this page is

Return to top

Home page for the Entity-Property-Relationship Approach to Introductory Statistics