4 x 3 x 2 Unbalanced ANOVA

The following is self-documenting output from a SAS IML computer program:

NOTE: SAS (r) Proprietary Software Release 6.12  TS020
      Licensed to DONALD B. MACNAUGHTON, Site 0025250001.

1     /*
2    PR0165.SAS
3
4                 COMPUTING NUMERATOR SUMS OF SQUARES
5                 IN UNBALANCED ANALYSIS OF VARIANCE:
6                           Three-Way Case
7
8                        Donald B. Macnaughton
9                         donmac@matstat.com
10
11
12                         TABLE OF CONTENTS
13
14   Introductory Comments >
15      - abstract >
16      - introduction >
17
18   Preliminary Steps >
19      - load the Searle data into a SAS dataset and print the data
          >
20      - start PROC IML and read the data from the SAS dataset into
21        IML >
22      - generate the three main effect submatrices of the design
23        matrix >
24      - generate the four interaction submatrices of the design
25        matrix >
26      - make the SS subroutine available to the program >
27      - set the values of the three secondary arguments of the SS
28        subroutine >
29
30   Compute the Twenty-Two Sums of Squares Using the SS Subroutine
     >
31      - compute the six sequential (SAS Type I) sums of squares
          >
32      - compute the sum of squares for the highest-level interac-
33        tion >
34      - compute the six SAS Type II sums of squares >
35      - compute the six HTI = SAS Type III sums of squares >
36      - compute three HTO sums of squares >
37      - discuss the HTOS sums of squares >
38
39   Quit from IML >
40
41   Run PROC GLM to Compute Nineteen of the Sums of Squares (for
42   comparison with the values generated above) >
43
44   Appendix: Steps to Run the Program >
45
46   References >
47
48   Output from PROC PRINT >
49
50   Output from PROC GLM >
51
52
53                              ABSTRACT
54
55   This SAS program illustrates a conceptual point of view and the
56   matrix arithmetic for computing the following types of analysis
57   of variance numerator sums of squares:
58
59   - HTO (Higher-level Terms are Omitted)
60       = SAS Type II in the two-way case
61       = SPSS ANOVA Experimental
62
63   - SAS Type II
64
65   - HTOS (Higher-level Terms are Omitted unless Significant)
66       = a superset of SAS Type II and HTO
67
68   - HTI (Higher-Level Terms are Included)
69       = SAS Type III
70       = SPSS ANOVA UNIQUE
71       = the default method in many analysis of variance programs
72
73   - sequential
74       = SAS Type I
75       = SPSS ANOVA Hierarchical in the two-way case.
76
77   The conceptual point of view is one of computing an analysis of
78   variance sum of squares by computing the difference between the
79   residual sums of squares of two model equations (Yates 1934).
80   The matrix arithmetic is simple, and is specified directly in
81   terms of the conceptual point of view.
82
83   Computations are illustrated using data from a 4 x 3 x 2 unbal-
84   anced experiment discussed by Shayle Searle (1987, 392).
85
86
87                            INTRODUCTION
88
89   Discussion in this program is an extension of discussion in an-
90   other program, which computes sums of squares for an unbalanced
91   2 x 3 experiment (Macnaughton 1998).  In that program I note
92   that most researchers use analysis of variance to obtain the
93   resulting p-values, which are used to help in detecting rela-
94   tionships between variables.  In order to compute analysis of
95   variance p-values, it is mathematically necessary to first com-
96   pute certain "numerator sums of squares".
97
98   The other program illustrates the three best-known conceptual
99   methods (HTO, HTI, and sequential) for computing numerator sums
100  of squares when there are two discrete-valued predictor vari-
101  ables in the experiment.  When there are three or more dis-
102  crete-valued predictor variables, other methods of computing
103  numerator sums of squares become available.  The present pro-
104  gram illustrates some of these methods.
105
106
107               BEGINNING OF THE PROGRAM STATEMENTS
108
109  (Note:  If you wish to run this program on your computer, see
110  the checklist in the appendix.)
111
112  Define the title for the output and set SAS system options.
113  */
114
115  title 'IML/GLM 4 x 3 x 2 Unbalanced ANOVA, Searle (1987, 392)';
116  options linesize=80 nodate probsig=2;
117
118   /*
119  Load the Searle data into a SAS dataset.
120  */
121
122  data Searle_2 (keep = a b c y);
123    input a b c n @;
124    do i = 1 to n;
125       input y @;
126       output;
127       end;
128
129  cards;

NOTE: The data set WORK.SEARLE_2 has 48 observations and 4 variables.
NOTE: The DATA statement used 3.18 seconds.


154  ;
155
156   /*
157  Print the data.
158  */
159
160  proc print data=Searle_2;
161     run;

NOTE: The PROCEDURE PRINT used 1.05 seconds.


162
163
164   /*
165                BEGINNING OF THE IML COMPUTATIONS
166
167  Start PROC IML and reset options that control the destination
168  and appearance of the IML output.
169  */
170
171  proc iml;
IML Ready
172  reset log spaces=3;
173
174   /*
175  Read the data from the SAS dataset into IML.  Each of the four
176  variables in the dataset (i.e., a, b, c, and y) becomes a col-
177  umn vector in IML.  The column vectors inherit the names of the
178  respective dataset variables.
179  */
180
181  use Searle_2;
182  read all var _all_;
183
184   /*
185  Compute the main-effect submatrices of the overall design ma-
186  trix using the IML DESIGNF function.
187  */
188
189  aD = designf(a);
190  bD = designf(b);
191  cD = designf(c);
192
193   /*
194  Compute the interaction submatrices of the overall design ma-
195  trix using the IML HDIR (horizontal direct product) function.
196  */
197
198  abD = hdir(aD,bD);
199  acD = hdir(aD,cD);
200  bcD = hdir(bD,cD);
201
202  abcD = hdir(abD,cD);
203
204
205   /*
206  Include the subroutine that will be used to do the computa-
207  tions, but don't print it.
208  */
209
210  %include 'D:\PROGS\SS.SAS' / NOsource2;
NOTE: Module SS defined.
1053
1054   /*
1055  Set values of the three secondary arguments for the SS subrou-
1056  tine.  The settings instruct the subroutine to
1057  - use the first method of computing sums of squares
1058  - print the values of the computed sums of squares, but
1059  - omit printing the intermediate results.
1060  */
1061
1062  level = 1;
1063  printss = 1;
1064  printh = 0;
1065
1066
1067   /*
1068              BEGINNING OF CALLS TO THE SS SUBROUTINE
1069
1070  To facilitate comparisons, sums of squares are computed in this
1071  program in the order in which they appear in the output from
1072  SAS PROC GLM.  This order does not reflect the order of impor-
1073  tance of the sums of squares since (as I shall discuss further
1074  in later material) the SAS Type I sums of squares have a seri-
1075  ous problem.
1076
1077  (The SAS Type IV sums of squares are omitted because [as illus-
1078  trated in later GLM output from this program] they are identi-
1079  cal to the HTI (= SAS Type III) sums of squares for the Searle
1080  data.  Type IV sums of squares differ from HTI sums of squares
1081  only in the infrequent case in which there are empty cells.)
1082
1083  As I discuss in the program for the two-way case (1998), it is
1084  useful to view the computation of analysis of variance sums of
1085  squares in terms of the operation of computing the difference
1086  between the residual sums of squares of two overparameterized
1087  model equations.  The two equations are specified in terms of
1088  two matrices, XE and XR.
1089
1090  XE is the submatrix of the (full-column-rank) design matrix for
1091  the effect being tested.  XR is the horizontal concatenation of
1092  the submatrices of the design matrix for all the other terms
1093  (excluding the error term) on the right side of the two model
1094  equations whose residual sums of squares we wish to difference.
1095
1096
1097             SEQUENTIAL (SAS TYPE I) SUMS OF SQUARES
1098
1099  Sequential sums of squares are based on a chosen sequence for
1100  computing the sums of squares for the various effects.  The se-
1101  quence begins with the main effects and works upward through
1102  the interactions.  The sequence starts with only the constant
1103  term mu in the model, and terms are added to the model, one at
1104  a time, as in stepwise regression.  Once a term is added to the
1105  model, it stays.
1106
1107  (A problem with the sequential method is that it is not clear
1108  which particular sequence is best.  For example, SAS and SPSS
1109  use different logical sequences for their sequential sum of
1110  squares.)
1111
1112  The sums of squares computed in this section demonstrate the
1113  sequence used by SAS in computing the Type I sum of squares.
1114  For each sum of squares I show (in abbreviated form) the two
1115  model equations whose residual sums of squares are being dif-
1116  ferenced.  You may wish to confirm that the definition of XE
1117  and XR in each case is consistent with the two model equations.
1118
1119      Sequential A
1120          y = m
1121          y = m + a   */
1122  XE = aD;
1123  XR = J(48,1);
1124  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      173.272283272283000


NDIGITSS   NDIGITSR   NDIGITSI
    15.8       15.2       15.5

1125   /* Note how the value of RESULT given above is identical in
1126  all available digits with the Type I sum of squares for A in
1127  the GLM output generated later in this program.  (Similarly,
1128  all the following sums of squares computed in this IML program
1129  are identical to the corresponding sums of squares [when avail-
1130  able] from SAS GLM.)
1131
1132  The values NDIGITSS, NDIGITSR, and NDIGITSI are rough indica-
1133  tors of the number of digits of accuracy of the values in the
1134  "projection matrix", which I discuss in the other program
1135  (1998).
1136
1137
1138       Sequential B
1139          y = m + a
1140          y = m + a + b   */
1141  XE = bD;
1142  XR = J(48,1) || aD;
1143  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       54.465569159211000


NDIGITSS   NDIGITSR   NDIGITSI
    15.7       14.5       15.3

1144
1145   /* Sequential A x B
1146          y = m + a + b
1147          y = m + a + b + ab   */
1148  XE = abD;
1149  XR = J(48,1) || aD || bD;
1150  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      247.938338044696000


NDIGITSS   NDIGITSR   NDIGITSI
    15.0       14.6       14.9

1151
1152   /* Sequential C
1153          y = m + a + b     + ab
1154          y = m + a + b + c + ab   */
1155  XE = cD;
1156  XR = J(48,1) || aD || bD || abD;
1157  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       75.548068365043500


NDIGITSS   NDIGITSR   NDIGITSI
    15.5       15.0       15.7

1158
1159   /* Sequential A x C
1160          y = m + a + b + c + ab
1161          y = m + a + b + c + ab + ac   */
1162  XE = acD;
1163  XR = J(48,1) || aD || bD || cD || abD;
1164  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      200.189157199165000


NDIGITSS   NDIGITSR   NDIGITSI
    15.2       14.6       15.4

1165
1166   /* Sequential B x C
1167          y = m + a + b + c + ab + ac
1168          y = m + a + b + c + ab + ac + bc   */
1169  XE = bcD;
1170  XR = J(48,1) || aD || bD || cD || abD || acD;
1171  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       21.914797539212600


NDIGITSS   NDIGITSR   NDIGITSI
    13.9       14.8       14.4

1172
1173   /* A x B x C
1174  Note that the sum of squares for the three-way A x B x C inter-
1175  action is computed only once in the IML portion of this program
1176  because it is the same for all the types of sums of squares.
1177          y = m + a + b + c + ab + ac + bc
1178          y = m + a + b + c + ab + ac + bc + abc   */
1179  XE = abcD;
1180  XR = J(48,1) || aD || bD || cD || abD || acD || bcD;
1181  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      120.671786420388000


NDIGITSS   NDIGITSR   NDIGITSI
    14.8       14.7       14.8

1182
1183
1184   /*
1185                   SAS TYPE II SUMS OF SQUARES
1186
1187  Consider a definition
1188
1189      An analysis of variance effect is *contained* in an-
1190      other effect if the name of the former effect can be
1191      obtained from the name of the latter by deleting terms.
1192
1193  For example, the A effect is contained in the A x B effect be-
1194  cause the name of the A effect can be obtained by deleting the
1195  term B from the name of the A x B effect.
1196
1197  The XR matrix for a SAS Type II sum of squares is the horizon-
1198  tal concatenation of
1199
1200  - all the submatrices of the design matrix for terms in the
1201    model equation at the same level as the effect being tested
1202    plus
1203
1204  - all the submatrices (if any) of the design matrix for terms
1205    at lower levels than the effect being tested plus
1206
1207  - all the remaining submatrices of the design matrix whose ef-
1208    fects do NOT contain the effect being tested.
1209
1210  For example, the XR matrix for Type II A effect is the horizon-
1211  tal concatenation of
1212
1213  - the submatrices for the two effects at the same level as A --
1214    i.e., B and C.
1215
1216  - (no submatrices for effects at lower levels than A because a
1217    main effect is the lowest level possible)
1218
1219  - the submatrix for the B x C interaction (since this interac-
1220    tion does not contain the A effect).
1221
1222  Thus the sum of squares for the Type II A effect is computed as
1223  follows:
1224
1225      Type II A
1226          y = m     + b + c + bc
1227          y = m + a + b + c + bc   */
1228  XE = aD;
1229  XR = J(48,1) || bD || cD || bcD;
1230  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      193.561598949382000


NDIGITSS   NDIGITSR   NDIGITSI
    14.0       14.7       13.9

1231
1232   /* Type II B
1233          y = m + a     + c + ac
1234          y = m + a + b + c + ac   */
1235  XE = bD;
1236  XR = J(48,1) || aD || cD || acD;
1237  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       41.177222208828500


NDIGITSS   NDIGITSR   NDIGITSI
    13.8       14.7       13.9

1238
1239   /* Type II A x B
1240          y = m + a + b + c      + ac + bc
1241          y = m + a + b + c + ab + ac + bc   */
1242  XE = abD;
1243  XR = J(48,1) || aD || bD || cD || acD || bcD;
1244  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      205.989513334515000


NDIGITSS   NDIGITSR   NDIGITSI
    13.8       14.4       13.9

1245
1246   /* Type II C
1247          y = m + a + b     + ab
1248          y = m + a + b + c + ab   */
1249  XE = cD;
1250  XR = J(48,1) || aD || bD || abD;
1251  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       75.548068365043500


NDIGITSS   NDIGITSR   NDIGITSI
    15.5       15.0       15.7

1252
1253   /* Type II A x C
1254          y = m + a + b + c + ab      + bc
1255          y = m + a + b + c + ab + ac + bc   */
1256  XE = acD;
1257  XR = J(48,1) || aD || bD || cD || abD || bcD;
1258  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      200.831441782583000


NDIGITSS   NDIGITSR   NDIGITSI
    14.9       14.9       14.8

1259
1260   /* Type II B x C
1261          y = m + a + b + c + ab + ac
1262          y = m + a + b + c + ab + ac + bc   */
1263  XE = bcD;
1264  XR = J(48,1) || aD || bD || cD || abD || acD;
1265  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       21.914797539212600


NDIGITSS   NDIGITSR   NDIGITSI
    13.9       14.8       14.4

1266
1267
1268   /*
1269                HTI = SAS Type III SUMS OF SQUARES
1270
1271  The XR matrix for an HTI (Higher-level Terms Included) sum of
1272  squares always consists of the horizontal concatenation of sub-
1273  matrices for ALL the terms on the right side of the "saturated"
1274  version of the overparameterized model equation (except the er-
1275  ror term and except the term specified in the specification of
1276  XE).  (The saturated version of a model equation is simply the
1277  model that contains all the possible terms.)  Thus for the
1278  Searle data the XR matrix for an HTI sum of squares is a hori-
1279  zontal concatenation of seven matrices in each of the six calls
1280  to SS that follow.
1281
1282      HTI A
1283          y = m     + b + c + ab + ac + bc + abc
1284          y = m + a + b + c + ab + ac + bc + abc         */
1285  XE = aD;
1286  XR = J(48,1) || bD || cD || abD || acD || bcD || abcD;
1287  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      194.175387286935000


NDIGITSS   NDIGITSR   NDIGITSI
    15.3       14.5       15.3

1288
1289   /* HTI B
1290          y = m + a     + c + ab + ac + bc + abc
1291          y = m + a + b + c + ab + ac + bc + abc         */
1292  XE = bD;
1293  XR = J(48,1) || aD || cD || abD || acD || bcD || abcD;
1294  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       10.376010430247600


NDIGITSS   NDIGITSR   NDIGITSI
    15.6       14.7       15.7

1295
1296   /* HTI A x B
1297          y = m + a + b + c      + ac + bc + abc
1298          y = m + a + b + c + ab + ac + bc + abc         */
1299  XE = abD;
1300  XR = J(48,1) || aD || bD || cD || acD || bcD || abcD;
1301  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      160.480840013270000


NDIGITSS   NDIGITSR   NDIGITSI
    14.7       14.7       14.9

1302
1303   /* HTI C
1304      Searle's exact answer (1987, 394) is 62
1305          y = m + a + b     + ab + ac + bc + abc
1306          y = m + a + b + c + ab + ac + bc + abc         */
1307  XE = cD;
1308  XR = J(48,1) || aD || bD || abD || acD || bcD || abcD;
1309  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       61.999999999999900


NDIGITSS   NDIGITSR   NDIGITSI
    15.7       15.2       15.7

1310
1311   /* HTI A x C
1312          y = m + a + b + c + ab      + bc + abc
1313          y = m + a + b + c + ab + ac + bc + abc         */
1314  XE = acD;
1315  XR = J(48,1) || aD || bD || cD || abD || bcD || abcD;
1316  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      215.450958887416000


NDIGITSS   NDIGITSR   NDIGITSI
    15.4       14.8       15.5

1317
1318   /* HTI B x C
1319      Searle's exact answer (1987, 395) is 192(1678)/11505
1320      = 28.00312 90743 15514 99
1321          y = m + a + b + c + ab + ac      + abc
1322          y = m + a + b + c + ab + ac + bc + abc         */
1323  XE = bcD;
1324  XR = J(48,1) || aD || bD || cD || abD || acD || abcD;
1325  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       28.003129074315500


NDIGITSS   NDIGITSR   NDIGITSI
    15.3       14.8       15.5

1326
1327
1328   /*
1329                       HTO SUMS OF SQUARES
1330
1331  The XR matrix for an HTO sum of squares consists of the hori-
1332  zontal concatenation of the submatrices of the design matrix
1333  for all the effects at the same level as and at lower levels
1334  than the effect being tested.  That is, Higher-level Terms are
1335  Omitted (HTO).
1336
1337  Note the similarity and difference between the HTO and the SAS
1338  Type II methods:  In a two-way experiment the HTO and SAS Type
1339  II sums of squares are identical.  But in the three-way case
1340  and higher there are differences.  In the three-way case the
1341  differences are only in the main effects.  That is, for an HTO
1342  main effect the submatrices for all the higher-level effects
1343  are omitted from XR.  On the other hand, for a SAS Type II main
1344  effect the submatrices for higher-level effects that do not
1345  contain the effect being tested are included in XR.
1346
1347  For example, the XR matrix for the SAS Type II A main effect
1348  (as discussed above in the Type II section) is
1349
1350                 XR = J(48,1) || bD || cD || bcD.
1351
1352  But the XR matrix for the HTO A main effect is
1353
1354                 XR = J(48,1) || bD || cD.
1355
1356  In cases in which there is no B x C interaction extant in the
1357  population, the HTO sum of squares provides a slightly more
1358  powerful statistical test of the A effect than the SAS Type II
1359  sum of squares.  (In cases in which there *is* an extant B x C
1360  interaction, the HTO approach should be not used to test for
1361  the A effect because, as I shall demonstrate in later material,
1362  an extant B x C interaction can "contaminate" the HTO A statis-
1363  tical test.)
1364
1365  Following are the three cases where the HTO sums of squares and
1366  the SAS Type II sums of squares differ.  (The HTO sums of
1367  squares that appear below do not appear in the output from PROC
1368  GLM because GLM cannot directly compute HTO sums of squares.)
1369
1370      HTO A
1371          y = m     + b + c
1372          y = m + a + b + c   */
1373  XE = aD;
1374  XR = J(48,1) || bD || cD;
1375  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
      183.011289644412000


NDIGITSS   NDIGITSR   NDIGITSI
    13.8       14.8       13.6

1376
1377   /* HTO B
1378          y = m + a     + c
1379          y = m + a + b + c   */
1380  XE = bD;
1381  XR = J(48,1) || aD || cD;
1382  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       52.761053453843100


NDIGITSS   NDIGITSR   NDIGITSI
    14.6       14.7       14.2

1383
1384   /* HTO C
1385          y = m + a + b
1386          y = m + a + b + c   */
1387  XE = cD;
1388  XR = J(48,1) || aD || bD;
1389  call SS(result,  y, XE, XR,  level,printss,printh);

                   RESULT
       98.514715858371800


NDIGITSS   NDIGITSR   NDIGITSI
    15.6       14.7       15.5

1390
1391
1392   /*
1393                       HTOS SUMS OF SQUARES
1394
1395  I discuss HTOS sums of squares in a paper (1997, appendix D).
1396
1397  As with all the other types of sums of squares, the XE matrix
1398  for an HTOS sum of squares consists of the submatrix of the de-
1399  sign matrix for the effect being tested.
1400
1401  Recall the definition of one effect containing another effect
1402  given in the discussion above of the SAS Type II sums of
1403  squares.  The XR matrix for an HTOS sum of squares consists of
1404  the horizontal concatenation of
1405
1406  - the submatrices for all the effects at the same level as the
1407    effect being tested (just like the HTO, Type II, and HTI sums
1408    of squares)
1409
1410  - the submatrices for all the effects (if any) at lower levels
1411    than the effect being tested (just like the HTO, Type II,
1412    HTI, and sequential sum of squares).
1413
1414  - the submatrices for non-containing higher-level interactions
1415    (just like the SAS Type II sums of squares) *if evidence of
1416    these interactions is found in the data*.
1417
1418  Note that the HTOS approach is a conditional approach, with the
1419  formula for computing a sum of squares depending on whether
1420  evidence of higher-level non-containing interactions is found
1421  in the data.
1422
1423  In the three-way case there are only three sums of squares in
1424  which the HTOS sums of squares are conceptually (but not compu-
1425  tationally) different from other sums of squares computed
1426  above, namely the three main-effect sums of squares.  If evi-
1427  dence of a non-containing higher-level interaction is found,
1428  the HTOS sum of squares for a main effect is identical to the
1429  corresponding SAS Type II sums of squares, as discussed and
1430  computed above.  But if no evidence of a non-containing inter-
1431  action is found, the HTOS sum of squares is identical to the
1432  corresponding HTO sum of squares, as also discussed and com-
1433  puted above.
1434
1435  The HTOS approach has two useful features:
1436
1437  1. If we are testing an effect and there is evidence that a
1438     non-containing higher-level interaction exists in the data,
1439     the HTOS approach correctly takes account of the interaction
1440     in the computation, thereby ensuring that the statistical
1441     test is valid.
1442
1443  2. On the other hand, if there is no evidence that the non-con-
1444     taining interaction exists, the HTOS approach provides a
1445     valid statistical test for the existence of a relationship
1446     between the response variable and the relevant predictor
1447     variable(s).  This test is slightly more powerful than the
1448     HTI and SAS Type II tests.  I shall discuss the validity and
1449     power of this approach in later material.
1450
1451
1452                        QUIT FROM PROC IML
1453
1454  This ends the computation of sums of squares in PROC IML.
1455  Quit from IML.
1456  */
1457
1458  quit;
Exiting IML.
NOTE: 87 workspace compresses.
NOTE: The PROCEDURE IML used 18.07 seconds.


1459
1460
1461   /*
1462                           GLM ANALYSIS
1463
1464  Analyze the data with PROC GLM for comparison with the above
1465  output from IML.  (The output from GLM comes in a separate out-
1466  put file.)
1467
1468  Examination of the GLM output reveals that all the GLM sums of
1469  squares are identical in all available digits to the sums of
1470  squares computed above in IML.
1471  */
1472
1473  proc glm data=Searle_2;
1474     class a b c;
1475     model y = a | b | c / ss1 ss2 ss3 ss4;
1476     quit;

NOTE: At least one W.D format was too small for the number to be printed. The
      decimal may be shifted by the "BEST" format.
NOTE: The PROCEDURE GLM used 3.85 seconds.


1477
1478  options date linesize=80 probsig=2;
1479  title ' ';
1480
1481
1482   /*
1483              APPENDIX:  STEPS TO RUN THIS PROGRAM
1484
1485  1. Ensure that the STAT and IML components of the SAS system
1486     are available on your computer.  Information about the SAS
1487     system is available at http://www.sas.com
1488
1489  2. Ensure that you have the source version of this program,
1490     which is called PR0165.SAS (not the HTML version, which is
1491     called PR0165.HTM).  You can obtain a copy of the source
1492     version in the "Computer Programs" section of the page at
1493     http://www.matstat.com/ss/
1494
1495  3. Install a copy of the SS subroutine on your computer.  This
1496     subroutine does the actual computations of sums of squares
1497     and is available at the above MatStat web site.
1498
1499  4. Edit the %INCLUDE statement above to correctly point to the
1500     location of the SS.SAS subroutine file on your computer.
1501     That is, change the
1502                          D:\PROGS\SS.SAS
1503     in the statement to the location where SS.SAS is stored on
1504     your computer.
1505
1506  5. (Optional.)  Modify the two OPTIONS statements in the pro-
1507     gram that set the DATE, LINESIZE, and PROBSIG options.
1508
1509  6. Submit the program to SAS.
1510
1511
1512                            REFERENCES
1513
1514  Macnaughton, D. B. 1997. Which sums of squares are best in un-
1515     balanced analysis of variance?  Available at
1516     http://www.matstat.com/ss/
1517
1518  Macnaughton, D. B. 1998. PR0139.HTM: Computing numerator sums
1519     of squares in unbalanced analysis of variance:  Two-way
1520     case). Available in the "Computer Programs" section at
1521     http://www.matstat.com/ss/
1522
1523  Searle, S. R. 1987. _Linear Models for Unbalanced Data._ New
1524     York: Wiley.
1525
1526  Yates, F. 1934. The analysis of multiple classifications with
1527     unequal numbers in the different classes.  _Journal of the
1528     American Statistical Association_ 29, 51-66.
1529
1530                     version of June 20, 1998
1531                    (end of program pr0165.sas) */

This is the end of the program log for the run of the program.

Output from PROC PRINT

Following is the output from PROC PRINT showing the data values that were analyzed, as given by Searle (1987, 392).

OBS    A    B    C     Y

  1    1    1    1    10
  2    1    1    2     4
  3    1    1    2     6
  4    1    2    1     3
  5    1    2    1     5
  6    1    2    2     2
  7    1    2    2     3
  8    1    2    2     7
  9    1    3    1     1
 10    1    3    1     2
 11    1    3    1     3
 12    1    3    2     4
 13    1    3    2     5
 14    1    3    2     9
 15    2    1    1     5
 16    2    1    1     9
 17    2    1    2     8
 18    2    2    1     5
 19    2    2    2     6
 20    2    2    2     8
 21    2    3    1     6
 22    2    3    1    10
 23    2    3    2     2
 24    3    1    1     2
 25    3    1    1     3
 26    3    1    1     3
 27    3    1    1     4
 28    3    1    2     3
 29    3    1    2     4
 30    3    1    2     8
 31    3    2    1     3
 32    3    2    1     4
 33    3    2    1     8
 34    3    2    2     4
 35    3    3    1     5
 36    3    3    2     6
 37    4    1    1     4
 38    4    1    2     5
 39    4    1    2     7
 40    4    1    2     9
 41    4    2    1     1
 42    4    2    1     1
 43    4    2    1     3
 44    4    2    1     7
 45    4    2    2    19
 46    4    3    1     8
 47    4    3    2    20
 48    4    3    2    24

Output from PROC GLM.

Following are the four analysis of variance tables generated for the above data by SAS PROC GLM:

Source   DF      Type I SS   Mean Square   F Value   Pr > F

A         3   173.27228327   57.75742776     11.36   8.E-05
B         2    54.46556916   27.23278458      5.36   .01192
A*B       6   247.93833804   41.32305634      8.13   7.E-05
C         1    75.54806837   75.54806837     14.86   .00076
A*C       3   200.18915720   66.72971907     13.13   3.E-05
B*C       2    21.91479754   10.95739877      2.16   .13774
A*B*C     6   120.67178642   20.11196440      3.96   .00684

Source   DF     Type II SS   Mean Square   F Value   Pr > F

A         3   193.56159895   64.52053298     12.69   4.E-05
B         2    41.17722221   20.58861110      4.05   .03051
A*B       6   205.98951333   34.33158556      6.75   .00028
C         1    75.54806837   75.54806837     14.86   .00076
A*C       3   200.83144178   66.94381393     13.17   3.E-05
B*C       2    21.91479754   10.95739877      2.16   .13774
A*B*C     6   120.67178642   20.11196440      3.96   .00684

Source   DF    Type III SS   Mean Square   F Value   Pr > F

A         3   194.17538729   64.72512910     12.73   4.E-05
B         2    10.37601043    5.18800522      1.02   .37550
A*B       6   160.48084001   26.74680667      5.26   .00139
C         1    62.00000000   62.00000000     12.20   .00188
A*C       3   215.45095889   71.81698630     14.13   2.E-05
B*C       2    28.00312907   14.00156454      2.75   .08377
A*B*C     6   120.67178642   20.11196440      3.96   .00684

Source   DF     Type IV SS   Mean Square   F Value   Pr > F

A         3   194.17538729   64.72512910     12.73   4.E-05
B         2    10.37601043    5.18800522      1.02   .37550
A*B       6   160.48084001   26.74680667      5.26   .00139
C         1    62.00000000   62.00000000     12.20   .00188
A*C       3   215.45095889   71.81698630     14.13   2.E-05
B*C       2    28.00312907   14.00156454      2.75   .08377
A*B*C     6   120.67178642   20.11196440      3.96   .00684

The low p-value for the A × B × C interaction (i.e., .0068 in all four tables) provides good evidence that (1) there is a relationship between the response variable and all three predictor variables, and (2) the relationship is a three-way interaction.

This is the end of the output from PR0165.SAS.

Return to top

Donald Macnaughton's page on unbalanced analysis of variance