RII Program for LOGIT analysis — supplementary material for Montalto and Yuh, v.9(1), Financial Counseling and Planning

RII Program for LOGIT analysis



/*Use Courier 10 point font to maintain document set up.
RII Program for LOGIT analysis
This program was written by Yoonkyung Yuh and Catherine P. Montalto.  The program is written in
SAS/IML matrix language and uses the repeated-imputation inference (RII) technique
to derive Logit regression coefficients, estimates of the standard error of the
coefficients, and chi-square test statistics for individual coefficients.  The
procedure for calculating the model test statistic from the m (where m=5 in the
Survey of Consumer Finances) complete-data chi-square statistics is explained at the
end of the SAS program.  The RII technique incorporates the variability in the data
due to missing data into the estimates of the standard errors of the coefficients. 
Equation number correspond to Appendix A in Montalto & Sung (1996), Multiple
imputation in the 1992 Survey of Consumer Finances, Financial Counseling and
Planning, Vol. 7, pp. 133-146.

The computer code is provided as a service.  Users should familiarize themselves
with the RII technique and SAS/IML to insure appropriate and accurate use and
interpretation of the program and program output.  Users will need to insert the
appropriate job control language and data creation commands at the beginning of the
program.  DEP and X1 through X10 are used in the code to refer to variables.  Users
should edit the number of Xi variables as appropriate and define these variables in
the program.
*/

IMPLIC=Y1-(10*YY1);
DEP={variable_name}; X1={variable_name}; X2={variable_name};
X3= {variable_name}; X4={variable_name}; X5={variable_name};
X6= {variable_name}; X7={variable_name}; X8={variable_name}; 
X9= {variable_name}; X10={variable_name};   
KEEP IMPLIC DEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ;                                     
                                                                               
*RUN REGRESSION SEPARATELY ON EACH IMPLICATE AND SAVE COEFFICIENTS AND          
VARIANCE-COVARIANCE MATRIX;                                                     
DATA A; SET FINAL;                                                              
IF IMPLIC=1;                                                                    
  PROC LOGISTIC DESCENDING OUTEST=BETAA COVOUT ;                                
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA B; SET FINAL;                                                              
IF IMPLIC=2;                                                                    
  PROC LOGISTIC DESCENDING OUTEST=BETAB COVOUT ;                                
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA C; SET FINAL;                                                              
IF IMPLIC=3;                                                                    
  PROC LOGISTIC DESCENDING OUTEST=BETAC COVOUT ;                                
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA D; SET FINAL;                                                              
IF IMPLIC=4;                                                                    
  PROC LOGISTIC DESCENDING OUTEST=BETAD COVOUT ;                                
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA E; SET FINAL;                                                              
IF IMPLIC=5;                                                                    
  PROC LOGISTIC DESCENDING OUTEST=BETAE COVOUT ;                                
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
                                                                                
*GENERATE COEFFICIENT VECTOR FROM EACH IMPLICATE;                               
DATA BETA1; SET BETAA; J+1;                                                     
IF J=1;                                                                         
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA BETA2; SET BETAB; J+1;                                                     
IF J=1;                                                                         
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA BETA3; SET BETAC; J+1;                                                     
IF J=1;                                                                         
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA BETA4; SET BETAD; J+1;                                                     
IF J=1;                                                                         
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA BETA5; SET BETAE; J+1;                                                     
IF J=1;                                                                         
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
                                                                                
*GENERATE VARIANCE-COVARIANCE MATRIX FROM EACH IMPLICATE;                       
DATA COV1; SET BETAA; J+1;                                                      
IF NOT (J=1);                                                                   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA COV2; SET BETAB; J+1;                                                      
IF NOT (J=1);                                                                   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA COV3; SET BETAC; J+1;                                                      
IF NOT (J=1);                                                                   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA COV4; SET BETAD; J+1;                                                      
IF NOT (J=1);                                                                   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA COV5; SET BETAE; J+1;                                                      
IF NOT (J=1);                                                                   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
                                                                                
***RII PROCEDURE FOR OLS REGRESSION***;                                         
                                                                                
PROC IML;                                                                       
*READ COEFFICIENTS AND VARIANCE-COVARIANCE MATRICES FROM BETA# AND COV#         
FILES AND GENERATE VECTORS AND MATRICES;                                        
USE BETA1 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO Q1;                                                           
USE BETA2 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO Q2;                                                           
USE BETA3 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO Q3;                                                           
USE BETA4 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO Q4;                                                           
USE BETA5 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO Q5;                                                           
USE COV1 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO U1;                                                           
USE COV2 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO U2;                                                           
USE COV3 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO U3;                                                           
USE COV4 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO U4;                                                           
USE COV5 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};
    READ ALL INTO U5;                                                           
                                                                                
*DEFINE M EQUAL TO THE NUMBER OF IMPUTATIONS;                                   
  M=5;                                                                          
*AVERAGE OF THE FIVE POINT ESTIMATES OF THE COEFFICIENTS (Eq. 1);               
  QBAR=(Q1+Q2+Q3+Q4+Q5)/M;                                                      
*AVERAGE WITHIN IMPUTATION VARIANCE (Eq. 2);                                    
  UBAR=(U1+U2+U3+U4+U5)/M;                                                      
*BETWEEN IMPUTATION VARIANCE (Eq. 3);                                           
  BM=((Q1-QBAR)`*(Q1-QBAR)+(Q2-QBAR)`*(Q2-QBAR)+(Q3-QBAR)`*(Q3-QBAR)
  +(Q4-QBAR)`*(Q4-QBAR)+(Q5-QBAR)`*(Q5-QBAR))/(M-1);                              
*RII TOTAL VARIANCE OF THE COEFFICIENTS (Eq. 4);                                
  TM=UBAR+(1+1/M)*BM;                                                           
  TMDIAG=DIAG(TM);                                                              
*RII STANDARD ERROR OF THE COEFFICIENTS (Eq. 5);                                
  TSTD=SQRT(TMDIAG);                                                            
*RELATIVE INCREASE IN VARIANCE DUE TO NONRESPONSE (Eq. 8);                      
  RM=((1+1/M)*BM)/UBAR;                                                         
*DEGREES OF FREEDOM (Eq. 7);                                                    
  V=(M-1)*(1+INV(DIAG(RM)))##2;                                                 
  DFV=(VECDIAG(V));                                                             
*FRACTION OF INFORMATION ABOUT COEFFICIENTS WHICH IS MISSING (Eq. 9);           
  GAMMA=(RM+(2/(V+3)))/(RM+1);                                                  
*CHI=-SQUARE TEST STATISTIC FOR THE COEFFICIENTS;
  CHISQ=((QBAR)##2)*INV(TMDIAG);                                                
  PVALUE=1-PROBCHI(CHISQ`, 1);                                                  
                                                                                
***STEPS TO DERIVE INFORMATION NEEDED FOR THE MODEL TEST STATISTIC***;               
                                                                                
*DEFINE K EQUAL TO THE NUMBER OF INDEPENDENT VARIABLES                          
 EXCLUDING THE INTERCEPT;                                                       
  K=10;                                                                         
*BETWEEN IMPUTATION VARIANCE MATRIX EXCLUDING INTERCEPT;                        
  BMINDP=BM(|2:K+1, 2:K+1|);                                                    
*WITHIN IMPUTATION VARIANCE MATRIX EXCLUDING INTERCEPT;                         
  UBARINDP=UBAR(|2:K+1, 2:K+1|);
*RELATIVE INCREASE IN VARIANCE USED IN CALCULATING MODEL TEST                   
STATISTIC (Eq. 14);                                                             
  RM14=(1+1/M)*TRACE(BMINDP*INV(UBARINDP))/K;
*DENOMINATOR DEGREES OF FREEDOM FOR THE MODEL TEST STATISTIC; 
  VDM=(M-1)*(1+INV(RM14))##2;                                                        
  DMDF=(K+1)*VDM/2;
                                                                                
***COMMANDS TO CREATE AND PRINT TABLES OF RESULTS;
                                               
TABLE1=QBAR`||VECDIAG(UBAR)||VECDIAG(BM)||VECDIAG(TM)||                         
 VECDIAG(TSTD)||VECDIAG(RM)||VECDIAG(V)||VECDIAG(GAMMA);                        
TABLE2=QBAR`||VECDIAG(TSTD)||CHISQ`||PVALUE;
TABLE3=RM14||K||DMDF;                                    
NAMES1={INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10};                                    
NAMES2={QBAR UBAR BM TM TSTD RM V GAMMA};                                 
NAMES3={QBAR TSTD CHISQ PVALUE};                                                
NAMES4={RM14 DF1 DF2};                                                      
TITLE='RII LOGISTIC REGRESSION RESULTS';                                        
PRINT TITLE;
PRINT TABLE1(|ROWNAME=NAMES1 COLNAME=NAMES2|);                                  
PRINT TABLE2(|ROWNAME=NAMES1 COLNAME=NAMES3|); 
PRINT TABLE3(|COLNAME=NAMES4|);                                 
                                                                                
/*                                                                              
//  EXEC WNOTIFY 

/*The model test statistic can be calculated from the m complete-data chi-square
statistics (in the SCF, m=5) and the information printed in Table 3 using the
following formula:

     model test statistic =

     [(average of the five chi-square statistics/k) - (4/6)*RM14 ] / (1 + RM14)

This formula corresponds to equation 15 in Montalto and Sung (1996) and to equation
3.1.14 on page 78 of Rubin (1987).  The parameter k is the number of independent
variables in the logit model excluding the intercept.
.
RM14 is printed in Table 3 in the SAS output.
The chi-square statistics associated with the -2 Log Likelihood Criterion from the
Logit regressions on each of the separate implicates are added together and the sum
is divided by five to get the "average of the five chi-square statistics".

This test statistic has an F-distribution with DF1 and DF2 degrees of freedom (DF1
and DF2 are the second and third elements in Table 3).

The p-value for the model test statistic can be determined from an F-table or by
using the following SAS command:
PROB=1-PROBF( {model_test_statistic}, DF1, DF2); 
*/



Catherine Montalto


Return to Journal home page