The following is self-documenting output from a SAS IML computer program:
NOTE: SAS (r) Proprietary Software Release 6.12 TS020
Licensed to DONALD B. MACNAUGHTON, Site 0025250001.
1 /*
2 PR0165.SAS
3
4 COMPUTING NUMERATOR SUMS OF SQUARES
5 IN UNBALANCED ANALYSIS OF VARIANCE:
6 Three-Way Case
7
8 Donald B. Macnaughton
9 donmac@matstat.com
10
11
12 TABLE OF CONTENTS
13
14 Introductory Comments >
15 - abstract >
16 - introduction >
17
18 Preliminary Steps >
19 - load the Searle data into a SAS dataset and print the data
>
20 - start PROC IML and read the data from the SAS dataset into
21 IML >
22 - generate the three main effect submatrices of the design
23 matrix >
24 - generate the four interaction submatrices of the design
25 matrix >
26 - make the SS subroutine available to the program >
27 - set the values of the three secondary arguments of the SS
28 subroutine >
29
30 Compute the Twenty-Two Sums of Squares Using the SS Subroutine
>
31 - compute the six sequential (SAS Type I) sums of squares
>
32 - compute the sum of squares for the highest-level interac-
33 tion >
34 - compute the six SAS Type II sums of squares >
35 - compute the six HTI = SAS Type III sums of squares >
36 - compute three HTO sums of squares >
37 - discuss the HTOS sums of squares >
38
39 Quit from IML >
40
41 Run PROC GLM to Compute Nineteen of the Sums of Squares (for
42 comparison with the values generated above) >
43
44 Appendix: Steps to Run the Program >
45
46 References >
47
48 Output from PROC PRINT >
49
50 Output from PROC GLM >
51
52
53 ABSTRACT
54
55 This SAS program illustrates a conceptual point of view and the
56 matrix arithmetic for computing the following types of analysis
57 of variance numerator sums of squares:
58
59 - HTO (Higher-level Terms are Omitted)
60 = SAS Type II in the two-way case
61 = SPSS ANOVA Experimental
62
63 - SAS Type II
64
65 - HTOS (Higher-level Terms are Omitted unless Significant)
66 = a superset of SAS Type II and HTO
67
68 - HTI (Higher-Level Terms are Included)
69 = SAS Type III
70 = SPSS ANOVA UNIQUE
71 = the default method in many analysis of variance programs
72
73 - sequential
74 = SAS Type I
75 = SPSS ANOVA Hierarchical in the two-way case.
76
77 The conceptual point of view is one of computing an analysis of
78 variance sum of squares by computing the difference between the
79 residual sums of squares of two model equations (Yates 1934).
80 The matrix arithmetic is simple, and is specified directly in
81 terms of the conceptual point of view.
82
83 Computations are illustrated using data from a 4 x 3 x 2 unbal-
84 anced experiment discussed by Shayle Searle (1987, 392).
85
86
87 INTRODUCTION
88
89 Discussion in this program is an extension of discussion in an-
90 other program, which computes sums of squares for an unbalanced
91 2 x 3 experiment (Macnaughton 1998). In that program I note
92 that most researchers use analysis of variance to obtain the
93 resulting p-values, which are used to help in detecting rela-
94 tionships between variables. In order to compute analysis of
95 variance p-values, it is mathematically necessary to first com-
96 pute certain "numerator sums of squares".
97
98 The other program illustrates the three best-known conceptual
99 methods (HTO, HTI, and sequential) for computing numerator sums
100 of squares when there are two discrete-valued predictor vari-
101 ables in the experiment. When there are three or more dis-
102 crete-valued predictor variables, other methods of computing
103 numerator sums of squares become available. The present pro-
104 gram illustrates some of these methods.
105
106
107 BEGINNING OF THE PROGRAM STATEMENTS
108
109 (Note: If you wish to run this program on your computer, see
110 the checklist in the appendix.)
111
112 Define the title for the output and set SAS system options.
113 */
114
115 title 'IML/GLM 4 x 3 x 2 Unbalanced ANOVA, Searle (1987, 392)';
116 options linesize=80 nodate probsig=2;
117
118 /*
119 Load the Searle data into a SAS dataset.
120 */
121
122 data Searle_2 (keep = a b c y);
123 input a b c n @;
124 do i = 1 to n;
125 input y @;
126 output;
127 end;
128
129 cards;
NOTE: The data set WORK.SEARLE_2 has 48 observations and 4 variables.
NOTE: The DATA statement used 3.18 seconds.
154 ;
155
156 /*
157 Print the data.
158 */
159
160 proc print data=Searle_2;
161 run;
NOTE: The PROCEDURE PRINT used 1.05 seconds.
162
163
164 /*
165 BEGINNING OF THE IML COMPUTATIONS
166
167 Start PROC IML and reset options that control the destination
168 and appearance of the IML output.
169 */
170
171 proc iml;
IML Ready
172 reset log spaces=3;
173
174 /*
175 Read the data from the SAS dataset into IML. Each of the four
176 variables in the dataset (i.e., a, b, c, and y) becomes a col-
177 umn vector in IML. The column vectors inherit the names of the
178 respective dataset variables.
179 */
180
181 use Searle_2;
182 read all var _all_;
183
184 /*
185 Compute the main-effect submatrices of the overall design ma-
186 trix using the IML DESIGNF function.
187 */
188
189 aD = designf(a);
190 bD = designf(b);
191 cD = designf(c);
192
193 /*
194 Compute the interaction submatrices of the overall design ma-
195 trix using the IML HDIR (horizontal direct product) function.
196 */
197
198 abD = hdir(aD,bD);
199 acD = hdir(aD,cD);
200 bcD = hdir(bD,cD);
201
202 abcD = hdir(abD,cD);
203
204
205 /*
206 Include the subroutine that will be used to do the computa-
207 tions, but don't print it.
208 */
209
210 %include 'D:\PROGS\SS.SAS' / NOsource2;
NOTE: Module SS defined.
1053
1054 /*
1055 Set values of the three secondary arguments for the SS subrou-
1056 tine. The settings instruct the subroutine to
1057 - use the first method of computing sums of squares
1058 - print the values of the computed sums of squares, but
1059 - omit printing the intermediate results.
1060 */
1061
1062 level = 1;
1063 printss = 1;
1064 printh = 0;
1065
1066
1067 /*
1068 BEGINNING OF CALLS TO THE SS SUBROUTINE
1069
1070 To facilitate comparisons, sums of squares are computed in this
1071 program in the order in which they appear in the output from
1072 SAS PROC GLM. This order does not reflect the order of impor-
1073 tance of the sums of squares since (as I shall discuss further
1074 in later material) the SAS Type I sums of squares have a seri-
1075 ous problem.
1076
1077 (The SAS Type IV sums of squares are omitted because [as illus-
1078 trated in later GLM output from this program] they are identi-
1079 cal to the HTI (= SAS Type III) sums of squares for the Searle
1080 data. Type IV sums of squares differ from HTI sums of squares
1081 only in the infrequent case in which there are empty cells.)
1082
1083 As I discuss in the program for the two-way case (1998), it is
1084 useful to view the computation of analysis of variance sums of
1085 squares in terms of the operation of computing the difference
1086 between the residual sums of squares of two overparameterized
1087 model equations. The two equations are specified in terms of
1088 two matrices, XE and XR.
1089
1090 XE is the submatrix of the (full-column-rank) design matrix for
1091 the effect being tested. XR is the horizontal concatenation of
1092 the submatrices of the design matrix for all the other terms
1093 (excluding the error term) on the right side of the two model
1094 equations whose residual sums of squares we wish to difference.
1095
1096
1097 SEQUENTIAL (SAS TYPE I) SUMS OF SQUARES
1098
1099 Sequential sums of squares are based on a chosen sequence for
1100 computing the sums of squares for the various effects. The se-
1101 quence begins with the main effects and works upward through
1102 the interactions. The sequence starts with only the constant
1103 term mu in the model, and terms are added to the model, one at
1104 a time, as in stepwise regression. Once a term is added to the
1105 model, it stays.
1106
1107 (A problem with the sequential method is that it is not clear
1108 which particular sequence is best. For example, SAS and SPSS
1109 use different logical sequences for their sequential sum of
1110 squares.)
1111
1112 The sums of squares computed in this section demonstrate the
1113 sequence used by SAS in computing the Type I sum of squares.
1114 For each sum of squares I show (in abbreviated form) the two
1115 model equations whose residual sums of squares are being dif-
1116 ferenced. You may wish to confirm that the definition of XE
1117 and XR in each case is consistent with the two model equations.
1118
1119 Sequential A
1120 y = m
1121 y = m + a */
1122 XE = aD;
1123 XR = J(48,1);
1124 call SS(result, y, XE, XR, level,printss,printh);
RESULT
173.272283272283000
NDIGITSS NDIGITSR NDIGITSI
15.8 15.2 15.5
1125 /* Note how the value of RESULT given above is identical in
1126 all available digits with the Type I sum of squares for A in
1127 the GLM output generated later in this program. (Similarly,
1128 all the following sums of squares computed in this IML program
1129 are identical to the corresponding sums of squares [when avail-
1130 able] from SAS GLM.)
1131
1132 The values NDIGITSS, NDIGITSR, and NDIGITSI are rough indica-
1133 tors of the number of digits of accuracy of the values in the
1134 "projection matrix", which I discuss in the other program
1135 (1998).
1136
1137
1138 Sequential B
1139 y = m + a
1140 y = m + a + b */
1141 XE = bD;
1142 XR = J(48,1) || aD;
1143 call SS(result, y, XE, XR, level,printss,printh);
RESULT
54.465569159211000
NDIGITSS NDIGITSR NDIGITSI
15.7 14.5 15.3
1144
1145 /* Sequential A x B
1146 y = m + a + b
1147 y = m + a + b + ab */
1148 XE = abD;
1149 XR = J(48,1) || aD || bD;
1150 call SS(result, y, XE, XR, level,printss,printh);
RESULT
247.938338044696000
NDIGITSS NDIGITSR NDIGITSI
15.0 14.6 14.9
1151
1152 /* Sequential C
1153 y = m + a + b + ab
1154 y = m + a + b + c + ab */
1155 XE = cD;
1156 XR = J(48,1) || aD || bD || abD;
1157 call SS(result, y, XE, XR, level,printss,printh);
RESULT
75.548068365043500
NDIGITSS NDIGITSR NDIGITSI
15.5 15.0 15.7
1158
1159 /* Sequential A x C
1160 y = m + a + b + c + ab
1161 y = m + a + b + c + ab + ac */
1162 XE = acD;
1163 XR = J(48,1) || aD || bD || cD || abD;
1164 call SS(result, y, XE, XR, level,printss,printh);
RESULT
200.189157199165000
NDIGITSS NDIGITSR NDIGITSI
15.2 14.6 15.4
1165
1166 /* Sequential B x C
1167 y = m + a + b + c + ab + ac
1168 y = m + a + b + c + ab + ac + bc */
1169 XE = bcD;
1170 XR = J(48,1) || aD || bD || cD || abD || acD;
1171 call SS(result, y, XE, XR, level,printss,printh);
RESULT
21.914797539212600
NDIGITSS NDIGITSR NDIGITSI
13.9 14.8 14.4
1172
1173 /* A x B x C
1174 Note that the sum of squares for the three-way A x B x C inter-
1175 action is computed only once in the IML portion of this program
1176 because it is the same for all the types of sums of squares.
1177 y = m + a + b + c + ab + ac + bc
1178 y = m + a + b + c + ab + ac + bc + abc */
1179 XE = abcD;
1180 XR = J(48,1) || aD || bD || cD || abD || acD || bcD;
1181 call SS(result, y, XE, XR, level,printss,printh);
RESULT
120.671786420388000
NDIGITSS NDIGITSR NDIGITSI
14.8 14.7 14.8
1182
1183
1184 /*
1185 SAS TYPE II SUMS OF SQUARES
1186
1187 Consider a definition
1188
1189 An analysis of variance effect is *contained* in an-
1190 other effect if the name of the former effect can be
1191 obtained from the name of the latter by deleting terms.
1192
1193 For example, the A effect is contained in the A x B effect be-
1194 cause the name of the A effect can be obtained by deleting the
1195 term B from the name of the A x B effect.
1196
1197 The XR matrix for a SAS Type II sum of squares is the horizon-
1198 tal concatenation of
1199
1200 - all the submatrices of the design matrix for terms in the
1201 model equation at the same level as the effect being tested
1202 plus
1203
1204 - all the submatrices (if any) of the design matrix for terms
1205 at lower levels than the effect being tested plus
1206
1207 - all the remaining submatrices of the design matrix whose ef-
1208 fects do NOT contain the effect being tested.
1209
1210 For example, the XR matrix for Type II A effect is the horizon-
1211 tal concatenation of
1212
1213 - the submatrices for the two effects at the same level as A --
1214 i.e., B and C.
1215
1216 - (no submatrices for effects at lower levels than A because a
1217 main effect is the lowest level possible)
1218
1219 - the submatrix for the B x C interaction (since this interac-
1220 tion does not contain the A effect).
1221
1222 Thus the sum of squares for the Type II A effect is computed as
1223 follows:
1224
1225 Type II A
1226 y = m + b + c + bc
1227 y = m + a + b + c + bc */
1228 XE = aD;
1229 XR = J(48,1) || bD || cD || bcD;
1230 call SS(result, y, XE, XR, level,printss,printh);
RESULT
193.561598949382000
NDIGITSS NDIGITSR NDIGITSI
14.0 14.7 13.9
1231
1232 /* Type II B
1233 y = m + a + c + ac
1234 y = m + a + b + c + ac */
1235 XE = bD;
1236 XR = J(48,1) || aD || cD || acD;
1237 call SS(result, y, XE, XR, level,printss,printh);
RESULT
41.177222208828500
NDIGITSS NDIGITSR NDIGITSI
13.8 14.7 13.9
1238
1239 /* Type II A x B
1240 y = m + a + b + c + ac + bc
1241 y = m + a + b + c + ab + ac + bc */
1242 XE = abD;
1243 XR = J(48,1) || aD || bD || cD || acD || bcD;
1244 call SS(result, y, XE, XR, level,printss,printh);
RESULT
205.989513334515000
NDIGITSS NDIGITSR NDIGITSI
13.8 14.4 13.9
1245
1246 /* Type II C
1247 y = m + a + b + ab
1248 y = m + a + b + c + ab */
1249 XE = cD;
1250 XR = J(48,1) || aD || bD || abD;
1251 call SS(result, y, XE, XR, level,printss,printh);
RESULT
75.548068365043500
NDIGITSS NDIGITSR NDIGITSI
15.5 15.0 15.7
1252
1253 /* Type II A x C
1254 y = m + a + b + c + ab + bc
1255 y = m + a + b + c + ab + ac + bc */
1256 XE = acD;
1257 XR = J(48,1) || aD || bD || cD || abD || bcD;
1258 call SS(result, y, XE, XR, level,printss,printh);
RESULT
200.831441782583000
NDIGITSS NDIGITSR NDIGITSI
14.9 14.9 14.8
1259
1260 /* Type II B x C
1261 y = m + a + b + c + ab + ac
1262 y = m + a + b + c + ab + ac + bc */
1263 XE = bcD;
1264 XR = J(48,1) || aD || bD || cD || abD || acD;
1265 call SS(result, y, XE, XR, level,printss,printh);
RESULT
21.914797539212600
NDIGITSS NDIGITSR NDIGITSI
13.9 14.8 14.4
1266
1267
1268 /*
1269 HTI = SAS Type III SUMS OF SQUARES
1270
1271 The XR matrix for an HTI (Higher-level Terms Included) sum of
1272 squares always consists of the horizontal concatenation of sub-
1273 matrices for ALL the terms on the right side of the "saturated"
1274 version of the overparameterized model equation (except the er-
1275 ror term and except the term specified in the specification of
1276 XE). (The saturated version of a model equation is simply the
1277 model that contains all the possible terms.) Thus for the
1278 Searle data the XR matrix for an HTI sum of squares is a hori-
1279 zontal concatenation of seven matrices in each of the six calls
1280 to SS that follow.
1281
1282 HTI A
1283 y = m + b + c + ab + ac + bc + abc
1284 y = m + a + b + c + ab + ac + bc + abc */
1285 XE = aD;
1286 XR = J(48,1) || bD || cD || abD || acD || bcD || abcD;
1287 call SS(result, y, XE, XR, level,printss,printh);
RESULT
194.175387286935000
NDIGITSS NDIGITSR NDIGITSI
15.3 14.5 15.3
1288
1289 /* HTI B
1290 y = m + a + c + ab + ac + bc + abc
1291 y = m + a + b + c + ab + ac + bc + abc */
1292 XE = bD;
1293 XR = J(48,1) || aD || cD || abD || acD || bcD || abcD;
1294 call SS(result, y, XE, XR, level,printss,printh);
RESULT
10.376010430247600
NDIGITSS NDIGITSR NDIGITSI
15.6 14.7 15.7
1295
1296 /* HTI A x B
1297 y = m + a + b + c + ac + bc + abc
1298 y = m + a + b + c + ab + ac + bc + abc */
1299 XE = abD;
1300 XR = J(48,1) || aD || bD || cD || acD || bcD || abcD;
1301 call SS(result, y, XE, XR, level,printss,printh);
RESULT
160.480840013270000
NDIGITSS NDIGITSR NDIGITSI
14.7 14.7 14.9
1302
1303 /* HTI C
1304 Searle's exact answer (1987, 394) is 62
1305 y = m + a + b + ab + ac + bc + abc
1306 y = m + a + b + c + ab + ac + bc + abc */
1307 XE = cD;
1308 XR = J(48,1) || aD || bD || abD || acD || bcD || abcD;
1309 call SS(result, y, XE, XR, level,printss,printh);
RESULT
61.999999999999900
NDIGITSS NDIGITSR NDIGITSI
15.7 15.2 15.7
1310
1311 /* HTI A x C
1312 y = m + a + b + c + ab + bc + abc
1313 y = m + a + b + c + ab + ac + bc + abc */
1314 XE = acD;
1315 XR = J(48,1) || aD || bD || cD || abD || bcD || abcD;
1316 call SS(result, y, XE, XR, level,printss,printh);
RESULT
215.450958887416000
NDIGITSS NDIGITSR NDIGITSI
15.4 14.8 15.5
1317
1318 /* HTI B x C
1319 Searle's exact answer (1987, 395) is 192(1678)/11505
1320 = 28.00312 90743 15514 99
1321 y = m + a + b + c + ab + ac + abc
1322 y = m + a + b + c + ab + ac + bc + abc */
1323 XE = bcD;
1324 XR = J(48,1) || aD || bD || cD || abD || acD || abcD;
1325 call SS(result, y, XE, XR, level,printss,printh);
RESULT
28.003129074315500
NDIGITSS NDIGITSR NDIGITSI
15.3 14.8 15.5
1326
1327
1328 /*
1329 HTO SUMS OF SQUARES
1330
1331 The XR matrix for an HTO sum of squares consists of the hori-
1332 zontal concatenation of the submatrices of the design matrix
1333 for all the effects at the same level as and at lower levels
1334 than the effect being tested. That is, Higher-level Terms are
1335 Omitted (HTO).
1336
1337 Note the similarity and difference between the HTO and the SAS
1338 Type II methods: In a two-way experiment the HTO and SAS Type
1339 II sums of squares are identical. But in the three-way case
1340 and higher there are differences. In the three-way case the
1341 differences are only in the main effects. That is, for an HTO
1342 main effect the submatrices for all the higher-level effects
1343 are omitted from XR. On the other hand, for a SAS Type II main
1344 effect the submatrices for higher-level effects that do not
1345 contain the effect being tested are included in XR.
1346
1347 For example, the XR matrix for the SAS Type II A main effect
1348 (as discussed above in the Type II section) is
1349
1350 XR = J(48,1) || bD || cD || bcD.
1351
1352 But the XR matrix for the HTO A main effect is
1353
1354 XR = J(48,1) || bD || cD.
1355
1356 In cases in which there is no B x C interaction extant in the
1357 population, the HTO sum of squares provides a slightly more
1358 powerful statistical test of the A effect than the SAS Type II
1359 sum of squares. (In cases in which there *is* an extant B x C
1360 interaction, the HTO approach should be not used to test for
1361 the A effect because, as I shall demonstrate in later material,
1362 an extant B x C interaction can "contaminate" the HTO A statis-
1363 tical test.)
1364
1365 Following are the three cases where the HTO sums of squares and
1366 the SAS Type II sums of squares differ. (The HTO sums of
1367 squares that appear below do not appear in the output from PROC
1368 GLM because GLM cannot directly compute HTO sums of squares.)
1369
1370 HTO A
1371 y = m + b + c
1372 y = m + a + b + c */
1373 XE = aD;
1374 XR = J(48,1) || bD || cD;
1375 call SS(result, y, XE, XR, level,printss,printh);
RESULT
183.011289644412000
NDIGITSS NDIGITSR NDIGITSI
13.8 14.8 13.6
1376
1377 /* HTO B
1378 y = m + a + c
1379 y = m + a + b + c */
1380 XE = bD;
1381 XR = J(48,1) || aD || cD;
1382 call SS(result, y, XE, XR, level,printss,printh);
RESULT
52.761053453843100
NDIGITSS NDIGITSR NDIGITSI
14.6 14.7 14.2
1383
1384 /* HTO C
1385 y = m + a + b
1386 y = m + a + b + c */
1387 XE = cD;
1388 XR = J(48,1) || aD || bD;
1389 call SS(result, y, XE, XR, level,printss,printh);
RESULT
98.514715858371800
NDIGITSS NDIGITSR NDIGITSI
15.6 14.7 15.5
1390
1391
1392 /*
1393 HTOS SUMS OF SQUARES
1394
1395 I discuss HTOS sums of squares in a paper (1997, appendix D).
1396
1397 As with all the other types of sums of squares, the XE matrix
1398 for an HTOS sum of squares consists of the submatrix of the de-
1399 sign matrix for the effect being tested.
1400
1401 Recall the definition of one effect containing another effect
1402 given in the discussion above of the SAS Type II sums of
1403 squares. The XR matrix for an HTOS sum of squares consists of
1404 the horizontal concatenation of
1405
1406 - the submatrices for all the effects at the same level as the
1407 effect being tested (just like the HTO, Type II, and HTI sums
1408 of squares)
1409
1410 - the submatrices for all the effects (if any) at lower levels
1411 than the effect being tested (just like the HTO, Type II,
1412 HTI, and sequential sum of squares).
1413
1414 - the submatrices for non-containing higher-level interactions
1415 (just like the SAS Type II sums of squares) *if evidence of
1416 these interactions is found in the data*.
1417
1418 Note that the HTOS approach is a conditional approach, with the
1419 formula for computing a sum of squares depending on whether
1420 evidence of higher-level non-containing interactions is found
1421 in the data.
1422
1423 In the three-way case there are only three sums of squares in
1424 which the HTOS sums of squares are conceptually (but not compu-
1425 tationally) different from other sums of squares computed
1426 above, namely the three main-effect sums of squares. If evi-
1427 dence of a non-containing higher-level interaction is found,
1428 the HTOS sum of squares for a main effect is identical to the
1429 corresponding SAS Type II sums of squares, as discussed and
1430 computed above. But if no evidence of a non-containing inter-
1431 action is found, the HTOS sum of squares is identical to the
1432 corresponding HTO sum of squares, as also discussed and com-
1433 puted above.
1434
1435 The HTOS approach has two useful features:
1436
1437 1. If we are testing an effect and there is evidence that a
1438 non-containing higher-level interaction exists in the data,
1439 the HTOS approach correctly takes account of the interaction
1440 in the computation, thereby ensuring that the statistical
1441 test is valid.
1442
1443 2. On the other hand, if there is no evidence that the non-con-
1444 taining interaction exists, the HTOS approach provides a
1445 valid statistical test for the existence of a relationship
1446 between the response variable and the relevant predictor
1447 variable(s). This test is slightly more powerful than the
1448 HTI and SAS Type II tests. I shall discuss the validity and
1449 power of this approach in later material.
1450
1451
1452 QUIT FROM PROC IML
1453
1454 This ends the computation of sums of squares in PROC IML.
1455 Quit from IML.
1456 */
1457
1458 quit;
Exiting IML.
NOTE: 87 workspace compresses.
NOTE: The PROCEDURE IML used 18.07 seconds.
1459
1460
1461 /*
1462 GLM ANALYSIS
1463
1464 Analyze the data with PROC GLM for comparison with the above
1465 output from IML. (The output from GLM comes in a separate out-
1466 put file.)
1467
1468 Examination of the GLM output reveals that all the GLM sums of
1469 squares are identical in all available digits to the sums of
1470 squares computed above in IML.
1471 */
1472
1473 proc glm data=Searle_2;
1474 class a b c;
1475 model y = a | b | c / ss1 ss2 ss3 ss4;
1476 quit;
NOTE: At least one W.D format was too small for the number to be printed. The
decimal may be shifted by the "BEST" format.
NOTE: The PROCEDURE GLM used 3.85 seconds.
1477
1478 options date linesize=80 probsig=2;
1479 title ' ';
1480
1481
1482 /*
1483 APPENDIX: STEPS TO RUN THIS PROGRAM
1484
1485 1. Ensure that the STAT and IML components of the SAS system
1486 are available on your computer. Information about the SAS
1487 system is available at http://www.sas.com
1488
1489 2. Ensure that you have the source version of this program,
1490 which is called PR0165.SAS (not the HTML version, which is
1491 called PR0165.HTM). You can obtain a copy of the source
1492 version in the "Computer Programs" section of the page at
1493 http://www.matstat.com/ss/
1494
1495 3. Install a copy of the SS subroutine on your computer. This
1496 subroutine does the actual computations of sums of squares
1497 and is available at the above MatStat web site.
1498
1499 4. Edit the %INCLUDE statement above to correctly point to the
1500 location of the SS.SAS subroutine file on your computer.
1501 That is, change the
1502 D:\PROGS\SS.SAS
1503 in the statement to the location where SS.SAS is stored on
1504 your computer.
1505
1506 5. (Optional.) Modify the two OPTIONS statements in the pro-
1507 gram that set the DATE, LINESIZE, and PROBSIG options.
1508
1509 6. Submit the program to SAS.
1510
1511
1512 REFERENCES
1513
1514 Macnaughton, D. B. 1997. Which sums of squares are best in un-
1515 balanced analysis of variance? Available at
1516 http://www.matstat.com/ss/
1517
1518 Macnaughton, D. B. 1998. PR0139.HTM: Computing numerator sums
1519 of squares in unbalanced analysis of variance: Two-way
1520 case). Available in the "Computer Programs" section at
1521 http://www.matstat.com/ss/
1522
1523 Searle, S. R. 1987. _Linear Models for Unbalanced Data._ New
1524 York: Wiley.
1525
1526 Yates, F. 1934. The analysis of multiple classifications with
1527 unequal numbers in the different classes. _Journal of the
1528 American Statistical Association_ 29, 51-66.
1529
1530 version of June 20, 1998
1531 (end of program pr0165.sas) */
This is the end of the program log for the run of the program.
Following is the output from PROC PRINT showing the data values that were analyzed, as given by Searle (1987, 392).
OBS A B C Y 1 1 1 1 10 2 1 1 2 4 3 1 1 2 6 4 1 2 1 3 5 1 2 1 5 6 1 2 2 2 7 1 2 2 3 8 1 2 2 7 9 1 3 1 1 10 1 3 1 2 11 1 3 1 3 12 1 3 2 4 13 1 3 2 5 14 1 3 2 9 15 2 1 1 5 16 2 1 1 9 17 2 1 2 8 18 2 2 1 5 19 2 2 2 6 20 2 2 2 8 21 2 3 1 6 22 2 3 1 10 23 2 3 2 2 24 3 1 1 2 25 3 1 1 3 26 3 1 1 3 27 3 1 1 4 28 3 1 2 3 29 3 1 2 4 30 3 1 2 8 31 3 2 1 3 32 3 2 1 4 33 3 2 1 8 34 3 2 2 4 35 3 3 1 5 36 3 3 2 6 37 4 1 1 4 38 4 1 2 5 39 4 1 2 7 40 4 1 2 9 41 4 2 1 1 42 4 2 1 1 43 4 2 1 3 44 4 2 1 7 45 4 2 2 19 46 4 3 1 8 47 4 3 2 20 48 4 3 2 24
Following are the four analysis of variance tables generated for the above data by SAS PROC GLM:
Source DF Type I SS Mean Square F Value Pr > F A 3 173.27228327 57.75742776 11.36 8.E-05 B 2 54.46556916 27.23278458 5.36 .01192 A*B 6 247.93833804 41.32305634 8.13 7.E-05 C 1 75.54806837 75.54806837 14.86 .00076 A*C 3 200.18915720 66.72971907 13.13 3.E-05 B*C 2 21.91479754 10.95739877 2.16 .13774 A*B*C 6 120.67178642 20.11196440 3.96 .00684 Source DF Type II SS Mean Square F Value Pr > F A 3 193.56159895 64.52053298 12.69 4.E-05 B 2 41.17722221 20.58861110 4.05 .03051 A*B 6 205.98951333 34.33158556 6.75 .00028 C 1 75.54806837 75.54806837 14.86 .00076 A*C 3 200.83144178 66.94381393 13.17 3.E-05 B*C 2 21.91479754 10.95739877 2.16 .13774 A*B*C 6 120.67178642 20.11196440 3.96 .00684 Source DF Type III SS Mean Square F Value Pr > F A 3 194.17538729 64.72512910 12.73 4.E-05 B 2 10.37601043 5.18800522 1.02 .37550 A*B 6 160.48084001 26.74680667 5.26 .00139 C 1 62.00000000 62.00000000 12.20 .00188 A*C 3 215.45095889 71.81698630 14.13 2.E-05 B*C 2 28.00312907 14.00156454 2.75 .08377 A*B*C 6 120.67178642 20.11196440 3.96 .00684 Source DF Type IV SS Mean Square F Value Pr > F A 3 194.17538729 64.72512910 12.73 4.E-05 B 2 10.37601043 5.18800522 1.02 .37550 A*B 6 160.48084001 26.74680667 5.26 .00139 C 1 62.00000000 62.00000000 12.20 .00188 A*C 3 215.45095889 71.81698630 14.13 2.E-05 B*C 2 28.00312907 14.00156454 2.75 .08377 A*B*C 6 120.67178642 20.11196440 3.96 .00684
The low p-value for the A × B × C interaction (i.e., .0068 in all four tables) provides good evidence that (1) there is a relationship between the response variable and all three predictor variables, and (2) the relationship is a three-way interaction.
This is the end of the output from PR0165.SAS.
Donald Macnaughton's page on unbalanced analysis of variance