RII Program for LOGIT analysis
/*Use Courier 10 point font to maintain document set up. RII Program for LOGIT analysis This program was written by Yoonkyung Yuh and Catherine P. Montalto. The program is written in SAS/IML matrix language and uses the repeated-imputation inference (RII) technique to derive Logit regression coefficients, estimates of the standard error of the coefficients, and chi-square test statistics for individual coefficients. The procedure for calculating the model test statistic from the m (where m=5 in the Survey of Consumer Finances) complete-data chi-square statistics is explained at the end of the SAS program. The RII technique incorporates the variability in the data due to missing data into the estimates of the standard errors of the coefficients. Equation number correspond to Appendix A in Montalto & Sung (1996), Multiple imputation in the 1992 Survey of Consumer Finances, Financial Counseling and Planning, Vol. 7, pp. 133-146. The computer code is provided as a service. Users should familiarize themselves with the RII technique and SAS/IML to insure appropriate and accurate use and interpretation of the program and program output. Users will need to insert the appropriate job control language and data creation commands at the beginning of the program. DEP and X1 through X10 are used in the code to refer to variables. Users should edit the number of Xi variables as appropriate and define these variables in the program. */ IMPLIC=Y1-(10*YY1); DEP={variable_name}; X1={variable_name}; X2={variable_name}; X3= {variable_name}; X4={variable_name}; X5={variable_name}; X6= {variable_name}; X7={variable_name}; X8={variable_name}; X9= {variable_name}; X10={variable_name}; KEEP IMPLIC DEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ; *RUN REGRESSION SEPARATELY ON EACH IMPLICATE AND SAVE COEFFICIENTS AND VARIANCE-COVARIANCE MATRIX; DATA A; SET FINAL; IF IMPLIC=1; PROC LOGISTIC DESCENDING OUTEST=BETAA COVOUT ; MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA B; SET FINAL; IF IMPLIC=2; PROC LOGISTIC DESCENDING OUTEST=BETAB COVOUT ; MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA C; SET FINAL; IF IMPLIC=3; PROC LOGISTIC DESCENDING OUTEST=BETAC COVOUT ; MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA D; SET FINAL; IF IMPLIC=4; PROC LOGISTIC DESCENDING OUTEST=BETAD COVOUT ; MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA E; SET FINAL; IF IMPLIC=5; PROC LOGISTIC DESCENDING OUTEST=BETAE COVOUT ; MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; *GENERATE COEFFICIENT VECTOR FROM EACH IMPLICATE; DATA BETA1; SET BETAA; J+1; IF J=1; KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA BETA2; SET BETAB; J+1; IF J=1; KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA BETA3; SET BETAC; J+1; IF J=1; KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA BETA4; SET BETAD; J+1; IF J=1; KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA BETA5; SET BETAE; J+1; IF J=1; KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; *GENERATE VARIANCE-COVARIANCE MATRIX FROM EACH IMPLICATE; DATA COV1; SET BETAA; J+1; IF NOT (J=1); KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA COV2; SET BETAB; J+1; IF NOT (J=1); KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA COV3; SET BETAC; J+1; IF NOT (J=1); KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA COV4; SET BETAD; J+1; IF NOT (J=1); KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; DATA COV5; SET BETAE; J+1; IF NOT (J=1); KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10; ***RII PROCEDURE FOR OLS REGRESSION***; PROC IML; *READ COEFFICIENTS AND VARIANCE-COVARIANCE MATRICES FROM BETA# AND COV# FILES AND GENERATE VECTORS AND MATRICES; USE BETA1 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO Q1; USE BETA2 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO Q2; USE BETA3 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO Q3; USE BETA4 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO Q4; USE BETA5 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO Q5; USE COV1 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO U1; USE COV2 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO U2; USE COV3 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO U3; USE COV4 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO U4; USE COV5 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; READ ALL INTO U5; *DEFINE M EQUAL TO THE NUMBER OF IMPUTATIONS; M=5; *AVERAGE OF THE FIVE POINT ESTIMATES OF THE COEFFICIENTS (Eq. 1); QBAR=(Q1+Q2+Q3+Q4+Q5)/M; *AVERAGE WITHIN IMPUTATION VARIANCE (Eq. 2); UBAR=(U1+U2+U3+U4+U5)/M; *BETWEEN IMPUTATION VARIANCE (Eq. 3); BM=((Q1-QBAR)`*(Q1-QBAR)+(Q2-QBAR)`*(Q2-QBAR)+(Q3-QBAR)`*(Q3-QBAR) +(Q4-QBAR)`*(Q4-QBAR)+(Q5-QBAR)`*(Q5-QBAR))/(M-1); *RII TOTAL VARIANCE OF THE COEFFICIENTS (Eq. 4); TM=UBAR+(1+1/M)*BM; TMDIAG=DIAG(TM); *RII STANDARD ERROR OF THE COEFFICIENTS (Eq. 5); TSTD=SQRT(TMDIAG); *RELATIVE INCREASE IN VARIANCE DUE TO NONRESPONSE (Eq. 8); RM=((1+1/M)*BM)/UBAR; *DEGREES OF FREEDOM (Eq. 7); V=(M-1)*(1+INV(DIAG(RM)))##2; DFV=(VECDIAG(V)); *FRACTION OF INFORMATION ABOUT COEFFICIENTS WHICH IS MISSING (Eq. 9); GAMMA=(RM+(2/(V+3)))/(RM+1); *CHI=-SQUARE TEST STATISTIC FOR THE COEFFICIENTS; CHISQ=((QBAR)##2)*INV(TMDIAG); PVALUE=1-PROBCHI(CHISQ`, 1); ***STEPS TO DERIVE INFORMATION NEEDED FOR THE MODEL TEST STATISTIC***; *DEFINE K EQUAL TO THE NUMBER OF INDEPENDENT VARIABLES EXCLUDING THE INTERCEPT; K=10; *BETWEEN IMPUTATION VARIANCE MATRIX EXCLUDING INTERCEPT; BMINDP=BM(|2:K+1, 2:K+1|); *WITHIN IMPUTATION VARIANCE MATRIX EXCLUDING INTERCEPT; UBARINDP=UBAR(|2:K+1, 2:K+1|); *RELATIVE INCREASE IN VARIANCE USED IN CALCULATING MODEL TEST STATISTIC (Eq. 14); RM14=(1+1/M)*TRACE(BMINDP*INV(UBARINDP))/K; *DENOMINATOR DEGREES OF FREEDOM FOR THE MODEL TEST STATISTIC; VDM=(M-1)*(1+INV(RM14))##2; DMDF=(K+1)*VDM/2; ***COMMANDS TO CREATE AND PRINT TABLES OF RESULTS; TABLE1=QBAR`||VECDIAG(UBAR)||VECDIAG(BM)||VECDIAG(TM)|| VECDIAG(TSTD)||VECDIAG(RM)||VECDIAG(V)||VECDIAG(GAMMA); TABLE2=QBAR`||VECDIAG(TSTD)||CHISQ`||PVALUE; TABLE3=RM14||K||DMDF; NAMES1={INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10}; NAMES2={QBAR UBAR BM TM TSTD RM V GAMMA}; NAMES3={QBAR TSTD CHISQ PVALUE}; NAMES4={RM14 DF1 DF2}; TITLE='RII LOGISTIC REGRESSION RESULTS'; PRINT TITLE; PRINT TABLE1(|ROWNAME=NAMES1 COLNAME=NAMES2|); PRINT TABLE2(|ROWNAME=NAMES1 COLNAME=NAMES3|); PRINT TABLE3(|COLNAME=NAMES4|); /* // EXEC WNOTIFY /*The model test statistic can be calculated from the m complete-data chi-square statistics (in the SCF, m=5) and the information printed in Table 3 using the following formula: model test statistic = [(average of the five chi-square statistics/k) - (4/6)*RM14 ] / (1 + RM14) This formula corresponds to equation 15 in Montalto and Sung (1996) and to equation 3.1.14 on page 78 of Rubin (1987). The parameter k is the number of independent variables in the logit model excluding the intercept. . RM14 is printed in Table 3 in the SAS output. The chi-square statistics associated with the -2 Log Likelihood Criterion from the Logit regressions on each of the separate implicates are added together and the sum is divided by five to get the "average of the five chi-square statistics". This test statistic has an F-distribution with DF1 and DF2 degrees of freedom (DF1 and DF2 are the second and third elements in Table 3). The p-value for the model test statistic can be determined from an F-table or by using the following SAS command: PROB=1-PROBF( {model_test_statistic}, DF1, DF2); */