The following is self-documenting output from a SAS IML computer program:

NOTE: SAS (r) Proprietary Software Release 6.12 TS020 Licensed to DONALD B. MACNAUGHTON, Site 0025250001. 1 /* 2 PR0165.SAS 3 4 COMPUTING NUMERATOR SUMS OF SQUARES 5 IN UNBALANCED ANALYSIS OF VARIANCE: 6 Three-Way Case 7 8 Donald B. Macnaughton 9 donmac@matstat.com 10 11 12 TABLE OF CONTENTS 13 14 Introductory Comments > 15 - abstract > 16 - introduction > 17 18 Preliminary Steps > 19 - load the Searle data into a SAS dataset and print the data > 20 - start PROC IML and read the data from the SAS dataset into 21 IML > 22 - generate the three main effect submatrices of the design 23 matrix > 24 - generate the four interaction submatrices of the design 25 matrix > 26 - make the SS subroutine available to the program > 27 - set the values of the three secondary arguments of the SS 28 subroutine > 29 30 Compute the Twenty-Two Sums of Squares Using the SS Subroutine > 31 - compute the six sequential (SAS Type I) sums of squares > 32 - compute the sum of squares for the highest-level interac- 33 tion > 34 - compute the six SAS Type II sums of squares > 35 - compute the six HTI = SAS Type III sums of squares > 36 - compute three HTO sums of squares > 37 - discuss the HTOS sums of squares > 38 39 Quit from IML > 40 41 Run PROC GLM to Compute Nineteen of the Sums of Squares (for 42 comparison with the values generated above) > 43 44 Appendix: Steps to Run the Program > 45 46 References > 47 48 Output from PROC PRINT > 49 50 Output from PROC GLM > 51 52 53 ABSTRACT 54 55 This SAS program illustrates a conceptual point of view and the 56 matrix arithmetic for computing the following types of analysis 57 of variance numerator sums of squares: 58 59 - HTO (Higher-level Terms are Omitted) 60 = SAS Type II in the two-way case 61 = SPSS ANOVA Experimental 62 63 - SAS Type II 64 65 - HTOS (Higher-level Terms are Omitted unless Significant) 66 = a superset of SAS Type II and HTO 67 68 - HTI (Higher-Level Terms are Included) 69 = SAS Type III 70 = SPSS ANOVA UNIQUE 71 = the default method in many analysis of variance programs 72 73 - sequential 74 = SAS Type I 75 = SPSS ANOVA Hierarchical in the two-way case. 76 77 The conceptual point of view is one of computing an analysis of 78 variance sum of squares by computing the difference between the 79 residual sums of squares of two model equations (Yates 1934). 80 The matrix arithmetic is simple, and is specified directly in 81 terms of the conceptual point of view. 82 83 Computations are illustrated using data from a 4 x 3 x 2 unbal- 84 anced experiment discussed by Shayle Searle (1987, 392). 85 86 87 INTRODUCTION 88 89 Discussion in this program is an extension of discussion in an- 90 other program, which computes sums of squares for an unbalanced 91 2 x 3 experiment (Macnaughton 1998). In that program I note 92 that most researchers use analysis of variance to obtain the 93 resulting p-values, which are used to help in detecting rela- 94 tionships between variables. In order to compute analysis of 95 variance p-values, it is mathematically necessary to first com- 96 pute certain "numerator sums of squares". 97 98 The other program illustrates the three best-known conceptual 99 methods (HTO, HTI, and sequential) for computing numerator sums 100 of squares when there are two discrete-valued predictor vari- 101 ables in the experiment. When there are three or more dis- 102 crete-valued predictor variables, other methods of computing 103 numerator sums of squares become available. The present pro- 104 gram illustrates some of these methods. 105 106 107 BEGINNING OF THE PROGRAM STATEMENTS 108 109 (Note: If you wish to run this program on your computer, see 110 the checklist in the appendix.) 111 112 Define the title for the output and set SAS system options. 113 */ 114 115 title 'IML/GLM 4 x 3 x 2 Unbalanced ANOVA, Searle (1987, 392)'; 116 options linesize=80 nodate probsig=2; 117 118 /* 119 Load the Searle data into a SAS dataset. 120 */ 121 122 data Searle_2 (keep = a b c y); 123 input a b c n @; 124 do i = 1 to n; 125 input y @; 126 output; 127 end; 128 129 cards; NOTE: The data set WORK.SEARLE_2 has 48 observations and 4 variables. NOTE: The DATA statement used 3.18 seconds. 154 ; 155 156 /* 157 Print the data. 158 */ 159 160 proc print data=Searle_2; 161 run; NOTE: The PROCEDURE PRINT used 1.05 seconds. 162 163 164 /* 165 BEGINNING OF THE IML COMPUTATIONS 166 167 Start PROC IML and reset options that control the destination 168 and appearance of the IML output. 169 */ 170 171 proc iml; IML Ready 172 reset log spaces=3; 173 174 /* 175 Read the data from the SAS dataset into IML. Each of the four 176 variables in the dataset (i.e., a, b, c, and y) becomes a col- 177 umn vector in IML. The column vectors inherit the names of the 178 respective dataset variables. 179 */ 180 181 use Searle_2; 182 read all var _all_; 183 184 /* 185 Compute the main-effect submatrices of the overall design ma- 186 trix using the IML DESIGNF function. 187 */ 188 189 aD = designf(a); 190 bD = designf(b); 191 cD = designf(c); 192 193 /* 194 Compute the interaction submatrices of the overall design ma- 195 trix using the IML HDIR (horizontal direct product) function. 196 */ 197 198 abD = hdir(aD,bD); 199 acD = hdir(aD,cD); 200 bcD = hdir(bD,cD); 201 202 abcD = hdir(abD,cD); 203 204 205 /* 206 Include the subroutine that will be used to do the computa- 207 tions, but don't print it. 208 */ 209 210 %include 'D:\PROGS\SS.SAS' / NOsource2; NOTE: Module SS defined. 1053 1054 /* 1055 Set values of the three secondary arguments for the SS subrou- 1056 tine. The settings instruct the subroutine to 1057 - use the first method of computing sums of squares 1058 - print the values of the computed sums of squares, but 1059 - omit printing the intermediate results. 1060 */ 1061 1062 level = 1; 1063 printss = 1; 1064 printh = 0; 1065 1066 1067 /* 1068 BEGINNING OF CALLS TO THE SS SUBROUTINE 1069 1070 To facilitate comparisons, sums of squares are computed in this 1071 program in the order in which they appear in the output from 1072 SAS PROC GLM. This order does not reflect the order of impor- 1073 tance of the sums of squares since (as I shall discuss further 1074 in later material) the SAS Type I sums of squares have a seri- 1075 ous problem. 1076 1077 (The SAS Type IV sums of squares are omitted because [as illus- 1078 trated in later GLM output from this program] they are identi- 1079 cal to the HTI (= SAS Type III) sums of squares for the Searle 1080 data. Type IV sums of squares differ from HTI sums of squares 1081 only in the infrequent case in which there are empty cells.) 1082 1083 As I discuss in the program for the two-way case (1998), it is 1084 useful to view the computation of analysis of variance sums of 1085 squares in terms of the operation of computing the difference 1086 between the residual sums of squares of two overparameterized 1087 model equations. The two equations are specified in terms of 1088 two matrices, XE and XR. 1089 1090 XE is the submatrix of the (full-column-rank) design matrix for 1091 the effect being tested. XR is the horizontal concatenation of 1092 the submatrices of the design matrix for all the other terms 1093 (excluding the error term) on the right side of the two model 1094 equations whose residual sums of squares we wish to difference. 1095 1096 1097 SEQUENTIAL (SAS TYPE I) SUMS OF SQUARES 1098 1099 Sequential sums of squares are based on a chosen sequence for 1100 computing the sums of squares for the various effects. The se- 1101 quence begins with the main effects and works upward through 1102 the interactions. The sequence starts with only the constant 1103 term mu in the model, and terms are added to the model, one at 1104 a time, as in stepwise regression. Once a term is added to the 1105 model, it stays. 1106 1107 (A problem with the sequential method is that it is not clear 1108 which particular sequence is best. For example, SAS and SPSS 1109 use different logical sequences for their sequential sum of 1110 squares.) 1111 1112 The sums of squares computed in this section demonstrate the 1113 sequence used by SAS in computing the Type I sum of squares. 1114 For each sum of squares I show (in abbreviated form) the two 1115 model equations whose residual sums of squares are being dif- 1116 ferenced. You may wish to confirm that the definition of XE 1117 and XR in each case is consistent with the two model equations. 1118 1119 Sequential A 1120 y = m 1121 y = m + a */ 1122 XE = aD; 1123 XR = J(48,1); 1124 call SS(result, y, XE, XR, level,printss,printh); RESULT 173.272283272283000 NDIGITSS NDIGITSR NDIGITSI 15.8 15.2 15.5 1125 /* Note how the value of RESULT given above is identical in 1126 all available digits with the Type I sum of squares for A in 1127 the GLM output generated later in this program. (Similarly, 1128 all the following sums of squares computed in this IML program 1129 are identical to the corresponding sums of squares [when avail- 1130 able] from SAS GLM.) 1131 1132 The values NDIGITSS, NDIGITSR, and NDIGITSI are rough indica- 1133 tors of the number of digits of accuracy of the values in the 1134 "projection matrix", which I discuss in the other program 1135 (1998). 1136 1137 1138 Sequential B 1139 y = m + a 1140 y = m + a + b */ 1141 XE = bD; 1142 XR = J(48,1) || aD; 1143 call SS(result, y, XE, XR, level,printss,printh); RESULT 54.465569159211000 NDIGITSS NDIGITSR NDIGITSI 15.7 14.5 15.3 1144 1145 /* Sequential A x B 1146 y = m + a + b 1147 y = m + a + b + ab */ 1148 XE = abD; 1149 XR = J(48,1) || aD || bD; 1150 call SS(result, y, XE, XR, level,printss,printh); RESULT 247.938338044696000 NDIGITSS NDIGITSR NDIGITSI 15.0 14.6 14.9 1151 1152 /* Sequential C 1153 y = m + a + b + ab 1154 y = m + a + b + c + ab */ 1155 XE = cD; 1156 XR = J(48,1) || aD || bD || abD; 1157 call SS(result, y, XE, XR, level,printss,printh); RESULT 75.548068365043500 NDIGITSS NDIGITSR NDIGITSI 15.5 15.0 15.7 1158 1159 /* Sequential A x C 1160 y = m + a + b + c + ab 1161 y = m + a + b + c + ab + ac */ 1162 XE = acD; 1163 XR = J(48,1) || aD || bD || cD || abD; 1164 call SS(result, y, XE, XR, level,printss,printh); RESULT 200.189157199165000 NDIGITSS NDIGITSR NDIGITSI 15.2 14.6 15.4 1165 1166 /* Sequential B x C 1167 y = m + a + b + c + ab + ac 1168 y = m + a + b + c + ab + ac + bc */ 1169 XE = bcD; 1170 XR = J(48,1) || aD || bD || cD || abD || acD; 1171 call SS(result, y, XE, XR, level,printss,printh); RESULT 21.914797539212600 NDIGITSS NDIGITSR NDIGITSI 13.9 14.8 14.4 1172 1173 /* A x B x C 1174 Note that the sum of squares for the three-way A x B x C inter- 1175 action is computed only once in the IML portion of this program 1176 because it is the same for all the types of sums of squares. 1177 y = m + a + b + c + ab + ac + bc 1178 y = m + a + b + c + ab + ac + bc + abc */ 1179 XE = abcD; 1180 XR = J(48,1) || aD || bD || cD || abD || acD || bcD; 1181 call SS(result, y, XE, XR, level,printss,printh); RESULT 120.671786420388000 NDIGITSS NDIGITSR NDIGITSI 14.8 14.7 14.8 1182 1183 1184 /* 1185 SAS TYPE II SUMS OF SQUARES 1186 1187 Consider a definition 1188 1189 An analysis of variance effect is *contained* in an- 1190 other effect if the name of the former effect can be 1191 obtained from the name of the latter by deleting terms. 1192 1193 For example, the A effect is contained in the A x B effect be- 1194 cause the name of the A effect can be obtained by deleting the 1195 term B from the name of the A x B effect. 1196 1197 The XR matrix for a SAS Type II sum of squares is the horizon- 1198 tal concatenation of 1199 1200 - all the submatrices of the design matrix for terms in the 1201 model equation at the same level as the effect being tested 1202 plus 1203 1204 - all the submatrices (if any) of the design matrix for terms 1205 at lower levels than the effect being tested plus 1206 1207 - all the remaining submatrices of the design matrix whose ef- 1208 fects do NOT contain the effect being tested. 1209 1210 For example, the XR matrix for Type II A effect is the horizon- 1211 tal concatenation of 1212 1213 - the submatrices for the two effects at the same level as A -- 1214 i.e., B and C. 1215 1216 - (no submatrices for effects at lower levels than A because a 1217 main effect is the lowest level possible) 1218 1219 - the submatrix for the B x C interaction (since this interac- 1220 tion does not contain the A effect). 1221 1222 Thus the sum of squares for the Type II A effect is computed as 1223 follows: 1224 1225 Type II A 1226 y = m + b + c + bc 1227 y = m + a + b + c + bc */ 1228 XE = aD; 1229 XR = J(48,1) || bD || cD || bcD; 1230 call SS(result, y, XE, XR, level,printss,printh); RESULT 193.561598949382000 NDIGITSS NDIGITSR NDIGITSI 14.0 14.7 13.9 1231 1232 /* Type II B 1233 y = m + a + c + ac 1234 y = m + a + b + c + ac */ 1235 XE = bD; 1236 XR = J(48,1) || aD || cD || acD; 1237 call SS(result, y, XE, XR, level,printss,printh); RESULT 41.177222208828500 NDIGITSS NDIGITSR NDIGITSI 13.8 14.7 13.9 1238 1239 /* Type II A x B 1240 y = m + a + b + c + ac + bc 1241 y = m + a + b + c + ab + ac + bc */ 1242 XE = abD; 1243 XR = J(48,1) || aD || bD || cD || acD || bcD; 1244 call SS(result, y, XE, XR, level,printss,printh); RESULT 205.989513334515000 NDIGITSS NDIGITSR NDIGITSI 13.8 14.4 13.9 1245 1246 /* Type II C 1247 y = m + a + b + ab 1248 y = m + a + b + c + ab */ 1249 XE = cD; 1250 XR = J(48,1) || aD || bD || abD; 1251 call SS(result, y, XE, XR, level,printss,printh); RESULT 75.548068365043500 NDIGITSS NDIGITSR NDIGITSI 15.5 15.0 15.7 1252 1253 /* Type II A x C 1254 y = m + a + b + c + ab + bc 1255 y = m + a + b + c + ab + ac + bc */ 1256 XE = acD; 1257 XR = J(48,1) || aD || bD || cD || abD || bcD; 1258 call SS(result, y, XE, XR, level,printss,printh); RESULT 200.831441782583000 NDIGITSS NDIGITSR NDIGITSI 14.9 14.9 14.8 1259 1260 /* Type II B x C 1261 y = m + a + b + c + ab + ac 1262 y = m + a + b + c + ab + ac + bc */ 1263 XE = bcD; 1264 XR = J(48,1) || aD || bD || cD || abD || acD; 1265 call SS(result, y, XE, XR, level,printss,printh); RESULT 21.914797539212600 NDIGITSS NDIGITSR NDIGITSI 13.9 14.8 14.4 1266 1267 1268 /* 1269 HTI = SAS Type III SUMS OF SQUARES 1270 1271 The XR matrix for an HTI (Higher-level Terms Included) sum of 1272 squares always consists of the horizontal concatenation of sub- 1273 matrices for ALL the terms on the right side of the "saturated" 1274 version of the overparameterized model equation (except the er- 1275 ror term and except the term specified in the specification of 1276 XE). (The saturated version of a model equation is simply the 1277 model that contains all the possible terms.) Thus for the 1278 Searle data the XR matrix for an HTI sum of squares is a hori- 1279 zontal concatenation of seven matrices in each of the six calls 1280 to SS that follow. 1281 1282 HTI A 1283 y = m + b + c + ab + ac + bc + abc 1284 y = m + a + b + c + ab + ac + bc + abc */ 1285 XE = aD; 1286 XR = J(48,1) || bD || cD || abD || acD || bcD || abcD; 1287 call SS(result, y, XE, XR, level,printss,printh); RESULT 194.175387286935000 NDIGITSS NDIGITSR NDIGITSI 15.3 14.5 15.3 1288 1289 /* HTI B 1290 y = m + a + c + ab + ac + bc + abc 1291 y = m + a + b + c + ab + ac + bc + abc */ 1292 XE = bD; 1293 XR = J(48,1) || aD || cD || abD || acD || bcD || abcD; 1294 call SS(result, y, XE, XR, level,printss,printh); RESULT 10.376010430247600 NDIGITSS NDIGITSR NDIGITSI 15.6 14.7 15.7 1295 1296 /* HTI A x B 1297 y = m + a + b + c + ac + bc + abc 1298 y = m + a + b + c + ab + ac + bc + abc */ 1299 XE = abD; 1300 XR = J(48,1) || aD || bD || cD || acD || bcD || abcD; 1301 call SS(result, y, XE, XR, level,printss,printh); RESULT 160.480840013270000 NDIGITSS NDIGITSR NDIGITSI 14.7 14.7 14.9 1302 1303 /* HTI C 1304 Searle's exact answer (1987, 394) is 62 1305 y = m + a + b + ab + ac + bc + abc 1306 y = m + a + b + c + ab + ac + bc + abc */ 1307 XE = cD; 1308 XR = J(48,1) || aD || bD || abD || acD || bcD || abcD; 1309 call SS(result, y, XE, XR, level,printss,printh); RESULT 61.999999999999900 NDIGITSS NDIGITSR NDIGITSI 15.7 15.2 15.7 1310 1311 /* HTI A x C 1312 y = m + a + b + c + ab + bc + abc 1313 y = m + a + b + c + ab + ac + bc + abc */ 1314 XE = acD; 1315 XR = J(48,1) || aD || bD || cD || abD || bcD || abcD; 1316 call SS(result, y, XE, XR, level,printss,printh); RESULT 215.450958887416000 NDIGITSS NDIGITSR NDIGITSI 15.4 14.8 15.5 1317 1318 /* HTI B x C 1319 Searle's exact answer (1987, 395) is 192(1678)/11505 1320 = 28.00312 90743 15514 99 1321 y = m + a + b + c + ab + ac + abc 1322 y = m + a + b + c + ab + ac + bc + abc */ 1323 XE = bcD; 1324 XR = J(48,1) || aD || bD || cD || abD || acD || abcD; 1325 call SS(result, y, XE, XR, level,printss,printh); RESULT 28.003129074315500 NDIGITSS NDIGITSR NDIGITSI 15.3 14.8 15.5 1326 1327 1328 /* 1329 HTO SUMS OF SQUARES 1330 1331 The XR matrix for an HTO sum of squares consists of the hori- 1332 zontal concatenation of the submatrices of the design matrix 1333 for all the effects at the same level as and at lower levels 1334 than the effect being tested. That is, Higher-level Terms are 1335 Omitted (HTO). 1336 1337 Note the similarity and difference between the HTO and the SAS 1338 Type II methods: In a two-way experiment the HTO and SAS Type 1339 II sums of squares are identical. But in the three-way case 1340 and higher there are differences. In the three-way case the 1341 differences are only in the main effects. That is, for an HTO 1342 main effect the submatrices for all the higher-level effects 1343 are omitted from XR. On the other hand, for a SAS Type II main 1344 effect the submatrices for higher-level effects that do not 1345 contain the effect being tested are included in XR. 1346 1347 For example, the XR matrix for the SAS Type II A main effect 1348 (as discussed above in the Type II section) is 1349 1350 XR = J(48,1) || bD || cD || bcD. 1351 1352 But the XR matrix for the HTO A main effect is 1353 1354 XR = J(48,1) || bD || cD. 1355 1356 In cases in which there is no B x C interaction extant in the 1357 population, the HTO sum of squares provides a slightly more 1358 powerful statistical test of the A effect than the SAS Type II 1359 sum of squares. (In cases in which there *is* an extant B x C 1360 interaction, the HTO approach should be not used to test for 1361 the A effect because, as I shall demonstrate in later material, 1362 an extant B x C interaction can "contaminate" the HTO A statis- 1363 tical test.) 1364 1365 Following are the three cases where the HTO sums of squares and 1366 the SAS Type II sums of squares differ. (The HTO sums of 1367 squares that appear below do not appear in the output from PROC 1368 GLM because GLM cannot directly compute HTO sums of squares.) 1369 1370 HTO A 1371 y = m + b + c 1372 y = m + a + b + c */ 1373 XE = aD; 1374 XR = J(48,1) || bD || cD; 1375 call SS(result, y, XE, XR, level,printss,printh); RESULT 183.011289644412000 NDIGITSS NDIGITSR NDIGITSI 13.8 14.8 13.6 1376 1377 /* HTO B 1378 y = m + a + c 1379 y = m + a + b + c */ 1380 XE = bD; 1381 XR = J(48,1) || aD || cD; 1382 call SS(result, y, XE, XR, level,printss,printh); RESULT 52.761053453843100 NDIGITSS NDIGITSR NDIGITSI 14.6 14.7 14.2 1383 1384 /* HTO C 1385 y = m + a + b 1386 y = m + a + b + c */ 1387 XE = cD; 1388 XR = J(48,1) || aD || bD; 1389 call SS(result, y, XE, XR, level,printss,printh); RESULT 98.514715858371800 NDIGITSS NDIGITSR NDIGITSI 15.6 14.7 15.5 1390 1391 1392 /* 1393 HTOS SUMS OF SQUARES 1394 1395 I discuss HTOS sums of squares in a paper (1997, appendix D). 1396 1397 As with all the other types of sums of squares, the XE matrix 1398 for an HTOS sum of squares consists of the submatrix of the de- 1399 sign matrix for the effect being tested. 1400 1401 Recall the definition of one effect containing another effect 1402 given in the discussion above of the SAS Type II sums of 1403 squares. The XR matrix for an HTOS sum of squares consists of 1404 the horizontal concatenation of 1405 1406 - the submatrices for all the effects at the same level as the 1407 effect being tested (just like the HTO, Type II, and HTI sums 1408 of squares) 1409 1410 - the submatrices for all the effects (if any) at lower levels 1411 than the effect being tested (just like the HTO, Type II, 1412 HTI, and sequential sum of squares). 1413 1414 - the submatrices for non-containing higher-level interactions 1415 (just like the SAS Type II sums of squares) *if evidence of 1416 these interactions is found in the data*. 1417 1418 Note that the HTOS approach is a conditional approach, with the 1419 formula for computing a sum of squares depending on whether 1420 evidence of higher-level non-containing interactions is found 1421 in the data. 1422 1423 In the three-way case there are only three sums of squares in 1424 which the HTOS sums of squares are conceptually (but not compu- 1425 tationally) different from other sums of squares computed 1426 above, namely the three main-effect sums of squares. If evi- 1427 dence of a non-containing higher-level interaction is found, 1428 the HTOS sum of squares for a main effect is identical to the 1429 corresponding SAS Type II sums of squares, as discussed and 1430 computed above. But if no evidence of a non-containing inter- 1431 action is found, the HTOS sum of squares is identical to the 1432 corresponding HTO sum of squares, as also discussed and com- 1433 puted above. 1434 1435 The HTOS approach has two useful features: 1436 1437 1. If we are testing an effect and there is evidence that a 1438 non-containing higher-level interaction exists in the data, 1439 the HTOS approach correctly takes account of the interaction 1440 in the computation, thereby ensuring that the statistical 1441 test is valid. 1442 1443 2. On the other hand, if there is no evidence that the non-con- 1444 taining interaction exists, the HTOS approach provides a 1445 valid statistical test for the existence of a relationship 1446 between the response variable and the relevant predictor 1447 variable(s). This test is slightly more powerful than the 1448 HTI and SAS Type II tests. I shall discuss the validity and 1449 power of this approach in later material. 1450 1451 1452 QUIT FROM PROC IML 1453 1454 This ends the computation of sums of squares in PROC IML. 1455 Quit from IML. 1456 */ 1457 1458 quit; Exiting IML. NOTE: 87 workspace compresses. NOTE: The PROCEDURE IML used 18.07 seconds. 1459 1460 1461 /* 1462 GLM ANALYSIS 1463 1464 Analyze the data with PROC GLM for comparison with the above 1465 output from IML. (The output from GLM comes in a separate out- 1466 put file.) 1467 1468 Examination of the GLM output reveals that all the GLM sums of 1469 squares are identical in all available digits to the sums of 1470 squares computed above in IML. 1471 */ 1472 1473 proc glm data=Searle_2; 1474 class a b c; 1475 model y = a | b | c / ss1 ss2 ss3 ss4; 1476 quit; NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format. NOTE: The PROCEDURE GLM used 3.85 seconds. 1477 1478 options date linesize=80 probsig=2; 1479 title ' '; 1480 1481 1482 /* 1483 APPENDIX: STEPS TO RUN THIS PROGRAM 1484 1485 1. Ensure that the STAT and IML components of the SAS system 1486 are available on your computer. Information about the SAS 1487 system is available at http://www.sas.com 1488 1489 2. Ensure that you have the source version of this program, 1490 which is called PR0165.SAS (not the HTML version, which is 1491 called PR0165.HTM). You can obtain a copy of the source 1492 version in the "Computer Programs" section of the page at 1493 http://www.matstat.com/ss/ 1494 1495 3. Install a copy of the SS subroutine on your computer. This 1496 subroutine does the actual computations of sums of squares 1497 and is available at the above MatStat web site. 1498 1499 4. Edit the %INCLUDE statement above to correctly point to the 1500 location of the SS.SAS subroutine file on your computer. 1501 That is, change the 1502 D:\PROGS\SS.SAS 1503 in the statement to the location where SS.SAS is stored on 1504 your computer. 1505 1506 5. (Optional.) Modify the two OPTIONS statements in the pro- 1507 gram that set the DATE, LINESIZE, and PROBSIG options. 1508 1509 6. Submit the program to SAS. 1510 1511 1512 REFERENCES 1513 1514 Macnaughton, D. B. 1997. Which sums of squares are best in un- 1515 balanced analysis of variance? Available at 1516 http://www.matstat.com/ss/ 1517 1518 Macnaughton, D. B. 1998. PR0139.HTM: Computing numerator sums 1519 of squares in unbalanced analysis of variance: Two-way 1520 case). Available in the "Computer Programs" section at 1521 http://www.matstat.com/ss/ 1522 1523 Searle, S. R. 1987. _Linear Models for Unbalanced Data._ New 1524 York: Wiley. 1525 1526 Yates, F. 1934. The analysis of multiple classifications with 1527 unequal numbers in the different classes. _Journal of the 1528 American Statistical Association_ 29, 51-66. 1529 1530 version of June 20, 1998 1531 (end of program pr0165.sas) */

This is the end of the program log for the run of the program.

Following is the output from PROC PRINT showing the data values that were analyzed, as given by Searle (1987, 392).

OBS A B C Y 1 1 1 1 10 2 1 1 2 4 3 1 1 2 6 4 1 2 1 3 5 1 2 1 5 6 1 2 2 2 7 1 2 2 3 8 1 2 2 7 9 1 3 1 1 10 1 3 1 2 11 1 3 1 3 12 1 3 2 4 13 1 3 2 5 14 1 3 2 9 15 2 1 1 5 16 2 1 1 9 17 2 1 2 8 18 2 2 1 5 19 2 2 2 6 20 2 2 2 8 21 2 3 1 6 22 2 3 1 10 23 2 3 2 2 24 3 1 1 2 25 3 1 1 3 26 3 1 1 3 27 3 1 1 4 28 3 1 2 3 29 3 1 2 4 30 3 1 2 8 31 3 2 1 3 32 3 2 1 4 33 3 2 1 8 34 3 2 2 4 35 3 3 1 5 36 3 3 2 6 37 4 1 1 4 38 4 1 2 5 39 4 1 2 7 40 4 1 2 9 41 4 2 1 1 42 4 2 1 1 43 4 2 1 3 44 4 2 1 7 45 4 2 2 19 46 4 3 1 8 47 4 3 2 20 48 4 3 2 24

Following are the four analysis of variance tables generated for the above data by SAS PROC GLM:

Source DF Type I SS Mean Square F Value Pr > F A 3 173.27228327 57.75742776 11.36 8.E-05 B 2 54.46556916 27.23278458 5.36 .01192 A*B 6 247.93833804 41.32305634 8.13 7.E-05 C 1 75.54806837 75.54806837 14.86 .00076 A*C 3 200.18915720 66.72971907 13.13 3.E-05 B*C 2 21.91479754 10.95739877 2.16 .13774 A*B*C 6 120.67178642 20.11196440 3.96 .00684 Source DF Type II SS Mean Square F Value Pr > F A 3 193.56159895 64.52053298 12.69 4.E-05 B 2 41.17722221 20.58861110 4.05 .03051 A*B 6 205.98951333 34.33158556 6.75 .00028 C 1 75.54806837 75.54806837 14.86 .00076 A*C 3 200.83144178 66.94381393 13.17 3.E-05 B*C 2 21.91479754 10.95739877 2.16 .13774 A*B*C 6 120.67178642 20.11196440 3.96 .00684 Source DF Type III SS Mean Square F Value Pr > F A 3 194.17538729 64.72512910 12.73 4.E-05 B 2 10.37601043 5.18800522 1.02 .37550 A*B 6 160.48084001 26.74680667 5.26 .00139 C 1 62.00000000 62.00000000 12.20 .00188 A*C 3 215.45095889 71.81698630 14.13 2.E-05 B*C 2 28.00312907 14.00156454 2.75 .08377 A*B*C 6 120.67178642 20.11196440 3.96 .00684 Source DF Type IV SS Mean Square F Value Pr > F A 3 194.17538729 64.72512910 12.73 4.E-05 B 2 10.37601043 5.18800522 1.02 .37550 A*B 6 160.48084001 26.74680667 5.26 .00139 C 1 62.00000000 62.00000000 12.20 .00188 A*C 3 215.45095889 71.81698630 14.13 2.E-05 B*C 2 28.00312907 14.00156454 2.75 .08377 A*B*C 6 120.67178642 20.11196440 3.96 .00684

The low p-value for the A × B × C interaction (i.e., .0068 in all four tables) provides good evidence that (1) there is a relationship between the response variable and all three predictor variables, and (2) the relationship is a three-way interaction.

This is the end of the output from PR0165.SAS.

Donald Macnaughton's page on unbalanced analysis of variance