One common mistake with logits is the interpretation of the odds ratios



One common mistake with logits is
the interpretation of the odds ratios


You can calculate the actual odds ratio from your proc freq (not controlling
for the effects of other variables)

e.g., if 40% of households with no savings rules spend less than income,


the odds = 0.4/(1-0.4) = 0.667

If 74% of households with savings rules spend less than income,


Allison* has a pretty good discussion of the odds ratios on pp. 11-12, though his interpretation of the ratios in logits is not quite right in terms of the probabilities.

the odds = 0.74/(1-0.74) = 2.846

The odds ratio = 4.269

However, the ratio of probabilities = 1.85

So, the probability of spending less than income for those with savings rules
is 85% higher than the probability of spending less than income for those
without savings rules

even though the odds ratio is quite high


In the logit, the odds ratio for having saving rules
was 2.139

However, the ratio of the predicted probabilities was only 1.51  (for how
I calculated that, see below)


I have found it very useful to calculate predicted probabilities for key

independent variables, e.g., comparing Blacks to Whites to Hispanics to

Asian/Others


The actual rates, e.g., of stock ownership, obviously reflect differences

in my other factors, e.g.,

Black and Hispanic households have less than half the mean income of White

households.


If your logistic regression controls for income and other factors, then the

coefficients for a race dummy variable will be related to the “all other

things equal” effects.


In an ordinary least squares regression analysis, interpretation of an

effect is easy.


e.g., if the dependent variable in an OLS regression is annual savings per year

and the coefficient for Black is -2000

we can directly interpret the result as meaning that if Black households

had the same income, etc. as the overall sample, then their predicted

savings per year would be 2000 less than the reference category, e.g., White.


However, for a logistic regression, the interpretation is much harder


SAS will give you a coefficient and an odds ratio, but most people do not

know how to correctly interpret the odds ratio


As an example of a correct interpretation, consider an
article in Financial Counseling and Planning

The Effect of Self-Control Mechanisms on Household Saving Behavior
Jong-Youn Rha, Catherine P.
Montalto, and Sherman D. Hanna
http://www.afcpe.org/doc/Vol%201722%20Self-Control%20Mechanisms.pdf

This Excel file contains some calculations
for that article


In column C,  Rows 7 and 8  show the actual percentages of households
that

save for those with no savings rules and for those with savings rules.


40% of households with no savings rules spend less than income
(“Save”)

74% of households with savings rules spend less than income (“Save”)


This is a difference of 34 percentage points, but you can also calculate

the odds

as p/(1-p)

so

the odds of savings for those with no savings rules = 0.667

the odds of savings for those with savings rules     
= 2.846

The odds ratio = 0.667/2.846 = 4.269

So, we could say that those with savings rules have odds of savings = 4.269

times as high as the odds of savings for those with no saving rules

But we should not say that the probabilities are that much higher, because

the ratio of actual probabilities = 1.85


Column B, row 16 shows the coefficient from the logistic regression (which

also included other variables)

SAS will show the odds ratio for each coefficient, which = exp(coefficient)

In this case (cell G16) that = 2.607782 or approximately 2.608


How do we calculate the predicted probability of saving for those with

rules and for those with no rules?


For an OLS regression, you would simply calculate SUM(AiXi)

where each Ai is the estimated coefficient and Xi would be typically either

mean or typical values of each independent variable, except for the one or

two variables you are discussing, e.g., race, or whether or not households

have saving rules.


The predicted probability for a logit is much more
complicated

Cell B15 has the formula for calculating the predicted probability, though

with a shortcut I invented over 10 years ago


=1/(1+EXP(-(B$13+B16*A16)))


B16 is the coefficient for the variable of interest, in this case, whether

a household has saving rules.

A16 is the mean value of the independent variable of interest, in this

example, 0.458, meaning that 45.8% of household have saving rules


B$13 is what I call the quasi-intercept — it is meant to represent the

effects of all of the other independent variables plus the constant or

intercept term shown in the logit output.  It is
useful not only to save

the time of entering all of the other coefficients and mean values, but

also because the predicted probability at the mean values of all other

variables is not the mean probability*, so we need to adjust the results so

they are comparable to the differences in the actual probabilities between

groups.


The mean value of whether households save is 0.559, in other words, 55.9%

of the households spent less than income.


When I started, I just had a value of 1 in cell B13


I placed the cursor on cell C15, clicked on Tools, then Goal Seeking, then

entered 0.559 for target,

For “By changing cell” I entered B13

then clicked OK, then OK when there was a result

Excel has a trial and error process to find a value of B13 such that the

predicted probability = mean probability

or pretty close


{I usually use the Quattro Pro spreadsheet because it allows for greater

accuracy, but Excel is close enough}


Now that I have the quasi-intercept representing the effects

I copy the formula in cell C15 down to the next 2 cells, and have the

predicted probabilities for my 2 groups

The odds ratios etc. are similar in meaning t0 the actual ones discussed

above.


In this example, we can conclude that there is a 34 percentage point

difference in the probability of savings between those with rules and those

without rules, but after we control for the effects of other variables the

difference narrows to 23 percentage points — still very substantial


See the conclusions of the article linked above to my discussion of that

result (that was one of my contributions to this article….)


*In a logit, at the mean value of all independent
variables, the predicted

value of the log odds = the sample mean value of the log odds, but that is

not equivalent to the mean probability.





*Allison, Paul D. 1999. Logistic Regression Using SAS: Theory and Application. Cary, NC: SAS Institute, Inc.