AS/IML program using the RII technique for linear regression

SAS/IML program using the RII technique for linear regression

/*Use Courier 10 point font to maintain document set up.
This program was written by Seonglim Lee and Catherine P. Montalto,
Consumer Sciences Department, The Ohio State University.  The
program is written in SAS/IML matrix language and uses the repeated-imputation 
inference (RII) technique to derive OLS regression coefficients,
estimates of the standard error of the coefficients, test statistics for
individual coefficients and a model test statistic.  RII is based on
Bayesian theory, and is applicable to linear and nonlinear models, and to
models estimated by both least squares and maximum likelihood.  This
program can be edited for nonlinear applications (i.e. PROC LIFEREG, PROC
LOGISTIC, PROC PROBIT).  The RII technique incorporates the variability in
the data due to missing data into the estimate of the standard error of the
mean.  Equation numbers correspond to Appendix A in Montalto & Sung (1996),
Multiple imputation in the 1992 Survey of Consumer Finances, Financial
Counseling and Planning, Vol. 7, pp. 133-146.

The computer code is provided as a service.  Users should familiarize
themselves with the RII technique and SAS/IML to insure appropriate and
accurate use and interpretation of the program and program output.  Users
will need to insert the appropriate job control language and data creation
commands at the beginning of the program.  DEP and X1 through X11 are used
in the code to ref er to variables.  Users should edit the number of Xi
variables as appropriate and define these variables in the program.
*/

IMP=Y1-INT(Y1/10)*10;
DEP={variable_name};  X1={variable_name};  X2={variable_name};
X3= {variable_name);  X4={variable_name};  X5={variable_name};
X6= {variable_name};  X7={variable_name};  X8={variable_name};
X9= {variable_name}; X10={variable_name}; X11={variable_name};
KEEP IMP DEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11;

*RUN REGRESSION SEPARATELY ON EACH IMPLICATE AND SAVE COEFFICIENTS AND
VARIANCE-COVARIANCE MATRIX;
DATA A; SET FINAL;
IF IMP=1;
  PROC REG OUTEST=BETAA COVOUT;
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11;   
DATA B; SET FINAL;   
IF IMP=2;   
  PROC REG OUTEST=BETAB COVOUT; 
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11;   
DATA C; SET FINAL;   
IF IMP=3;   
  PROC REG OUTEST=BETAC COVOUT; 
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11;   
DATA D; SET FINAL;   
IF IMP=4;   
  PROC REG OUTEST=BETAD COVOUT; 
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11;   
DATA E; SET FINAL;   
IF IMP=5;   
  PROC REG OUTEST=BETAE COVOUT; 
  MODEL DEP=X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11;   

*GENERATE COEFFICIENT VECTOR FROM EACH IMPLICATE;
DATA BETA1; SET BETAA; J+1;
IF J=1;
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA BETA2; SET BETAB; J+1;
IF J=1;
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA BETA3; SET BETAC; J+1;
IF J=1;
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA BETA4; SET BETAD; J+1;
IF J=1;
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA BETA5; SET BETAE; J+1;
IF J=1;
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 

*GENERATE VARIANCE-COVARIANCE MATRIX FROM EACH IMPLICATE;
DATA COV1; SET BETAA; J+1;
IF NOT (J=1);   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA COV2; SET BETAB; J+1;
IF NOT (J=1);   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA COV3; SET BETAC; J+1;
IF NOT (J=1);   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA COV4; SET BETAD; J+1;
IF NOT (J=1);   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 
DATA COV5; SET BETAE; J+1;
IF NOT (J=1);   
KEEP INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11; 

***RII PROCEDURE FOR OLS REGRESSION***;

PROC IML;
*READ COEFFICIENTS AND VARIANCE-COVARIANCE MATRICES FROM BETA# AND COV#
FILES AND GENERATE VECTORS AND MATRICES;
USE BETA1 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO Q1;
USE BETA2 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO Q2;
USE BETA3 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO Q3;
USE BETA4 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO Q4;
USE BETA5 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO Q5;
USE COV1 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO U1;
USE COV2 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO U2;
USE COV3 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO U3;
USE COV4 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO U4;
USE COV5 VAR{INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
    READ ALL INTO U5;

*DEFINE M EQUAL TO THE NUMBER OF IMPUTATIONS;
  M=5;
*AVERAGE OF THE FIVE POINT ESTIMATES OF THE COEFFICIENTS (Eq. 1);
  QBAR=(Q1+Q2+Q3+Q4+Q5)/M;
*AVERAGE WITHIN IMPUTATION VARIANCE (Eq. 2);
  UBAR=(U1+U2+U3+U4+U5)/M;
*BETWEEN IMPUTATION VARIANCE (Eq. 3);
  BM=((Q1-QBAR)`*(Q1-QBAR)+(Q2-QBAR)`*(Q2-QBAR)+(Q3-QBAR)`*(Q3-QBAR)
+(Q4-QBAR)`*(Q4-QBAR)+(Q5-QBAR)`*(Q5-QBAR))/(M-1);
*RII TOTAL VARIANCE OF THE COEFFICIENTS (Eq. 4);
  TM=UBAR+(1+1/M)*BM;
  TMDIAG=DIAG(TM);
*RII STANDARD ERROR OF THE COEFFICIENTS (Eq. 5);
  TSTD=SQRT(TMDIAG);
*RELATIVE INCREASE IN VARIANCE DUE TO NONRESPONSE (Eq. 8);
  RM=((1+1/M)*BM)/UBAR;
*DEGREES OF FREEDOM (Eq. 7);
  V=(M-1)*(1+INV(DIAG(RM)))##2;
  DFV=(VECDIAG(V));
*FRACTION OF INFORMATION ABOUT COEFFICIENTS WHICH IS MISSING (Eq. 9);
  GAMMA=(RM+(2/(V+3)))/(RM+1);
*TEST STATISTIC FOR THE COEFFICIENTS (Eq. 11);
  TVALUE=QBAR*INV(TSTD);
  PVALUE=(1-PROBT(ABS(TVALUE)`,VECDIAG(V)))*2;

***STEPS TO DERIVE THE MODEL TEST STATISTIC***;

*DEFINE K EQUAL TO THE NUMBER OF INDEPENDENT VARIABLES 
 EXCLUDING THE INTERCEPT;
  K=11;
*BETWEEN IMPUTATION VARIANCE MATRIX EXCLUDING INTERCEPT;
  BMINDP=BM(|2:K+1, 2:K+1|);
*WITHIN IMPUTATION VARIANCE MATRIX EXCLUDING INTERCEPT;
  UBARINDP=UBAR(|2:K+1, 2:K+1|);
*COEFFICIENT MATRIX EXCLUDING THE INTERCEPT;
  QBARINDP=QBAR(|,2:K+1|);
*RELATIVE INCREASE IN VARIANCE USED IN CALCULATING MODEL TEST
STATISTIC (Eq. 14);
  RM14=(1+1/M)*TRACE(BMINDP*INV(UBARINDP))/K;
*MODEL TEST STATISTIC (Eq. 13);
  DM=(1/(1+RM14))*(-1*QBARINDP)*INV(UBARINDP)*(-1*QBARINDP)`*(1/K);
*NUMERATOR DEGREES OF FREEDOM FOR THE MODEL TEST STATISTIC;
  VDM=(M-1)*(1+INV(RM14))##2;
*DENOMINATOR DEGREES OF FREEDOM FOR THE MODEL TEST STATISTIC;
  DMDF=(K+1)*VDM/2;
*P-VALUE FOR THE MODEL TEST STATISTIC;
  PROBF=1-PROBF(DM,K,DMDF);

***COMMANDS TO CREATE AND PRINT TABLES OF RESULTS;

TABLE1=DM||K||DMDF||PROBF||RM14;
TABLE2=QBAR`||VECDIAG(UBAR)||VECDIAG(BM)||VECDIAG(TM)||
 VECDIAG(TSTD)||VECDIAG(RM)||VECDIAG(V)||VECDIAG(GAMMA);
TABLE3=QBAR`||VECDIAG(TSTD)||TVALUE`||PVALUE;
NAMES1={INTERCEP X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11};
NAMES2={QBAR UBAR BM TM TSTD RM V GAMMA PROBF};
NAMES3={QBAR TSTD TVALUE PVALUE};
NAMES4={F DF1 DF2 P RM14};
TITLE='RII REGRESSION RESULTS';
PRINT TITLE;
PRINT TABLE1(|COLNAME=NAMES4|);
PRINT TABLE2(|ROWNAME=NAMES1 COLNAME=NAMES2|);
PRINT TABLE3(|ROWNAME=NAMES1 COLNAME=NAMES3|);
/*

Updated Nov. 22, 1996 by Catherine Montalto