Odds Ratios
One common mistake with logits is
the interpretation of the odds ratios
You can calculate the actual odds ratio from your proc freq (not controlling
for the effects of other variables)
e.g., if 40% of households with no savings rules spend less than income,
the odds = 0.4/(1-0.4) = 0.667
If 74% of households with savings rules spend less than income,
Allison* has a pretty good discussion of the odds ratios on pp. 11-12, though his interpretation of the ratios in logits is not quite right in terms of the probabilities.
the odds = 0.74/(1-0.74) = 2.846
The odds ratio = 4.269
However, the ratio of probabilities = 1.85
So, the probability of spending less than income for those with savings rules
is 85% higher than the probability of spending less than income for those
without savings rules
even though the odds ratio is quite high
In the logit, the odds ratio for having saving rules
was 2.139
However, the ratio of the predicted probabilities was only 1.51 (for how
I calculated that, see below)
I have found it very useful to calculate predicted probabilities for key
independent variables, e.g., comparing Blacks to Whites to Hispanics to
Asian/Others
The actual rates, e.g., of stock ownership, obviously reflect differences
in my other factors, e.g.,
Black and Hispanic households have less than half the mean income of White
households.
If your logistic regression controls for income and other factors, then the
coefficients for a race dummy variable will be related to the “all other
things equal” effects.
In an ordinary least squares regression analysis, interpretation of an
effect is easy.
e.g., if the dependent variable in an OLS regression is annual savings per year
and the coefficient for Black is -2000
we can directly interpret the result as meaning that if Black households
had the same income, etc. as the overall sample, then their predicted
savings per year would be 2000 less than the reference category, e.g., White.
However, for a logistic regression, the interpretation is much harder
SAS will give you a coefficient and an odds ratio, but most people do not
know how to correctly interpret the odds ratio
As an example of a correct interpretation, consider an
article in Financial Counseling and Planning
The Effect of Self-Control Mechanisms on Household Saving Behavior
Jong-Youn Rha, Catherine P.
Montalto, and Sherman D. Hanna
http://www.afcpe.org/doc/Vol%201722%20Self-Control%20Mechanisms.pdf
This Excel file contains some calculations
for that article
In column C, Rows 7 and 8 show the actual percentages of households
that
save for those with no savings rules and for those with savings rules.
40% of households with no savings rules spend less than income
(“Save”)
74% of households with savings rules spend less than income (“Save”)
This is a difference of 34 percentage points, but you can also calculate
the odds
as p/(1-p)
so
the odds of savings for those with no savings rules = 0.667
the odds of savings for those with savings rules
= 2.846
The odds ratio = 0.667/2.846 = 4.269
So, we could say that those with savings rules have odds of savings = 4.269
times as high as the odds of savings for those with no saving rules
But we should not say that the probabilities are that much higher, because
the ratio of actual probabilities = 1.85
Column B, row 16 shows the coefficient from the logistic regression (which
also included other variables)
SAS will show the odds ratio for each coefficient, which = exp(coefficient)
In this case (cell G16) that = 2.607782 or approximately 2.608
How do we calculate the predicted probability of saving for those with
rules and for those with no rules?
For an OLS regression, you would simply calculate SUM(AiXi)
where each Ai is the estimated coefficient and Xi would be typically either
mean or typical values of each independent variable, except for the one or
two variables you are discussing, e.g., race, or whether or not households
have saving rules.
The predicted probability for a logit is much more
complicated
Cell B15 has the formula for calculating the predicted probability, though
with a shortcut I invented over 10 years ago
=1/(1+EXP(-(B$13+B16*A16)))
B16 is the coefficient for the variable of interest, in this case, whether
a household has saving rules.
A16 is the mean value of the independent variable of interest, in this
example, 0.458, meaning that 45.8% of household have saving rules
B$13 is what I call the quasi-intercept — it is meant to represent the
effects of all of the other independent variables plus the constant or
intercept term shown in the logit output. It is
useful not only to save
the time of entering all of the other coefficients and mean values, but
also because the predicted probability at the mean values of all other
variables is not the mean probability*, so we need to adjust the results so
they are comparable to the differences in the actual probabilities between
groups.
The mean value of whether households save is 0.559, in other words, 55.9%
of the households spent less than income.
When I started, I just had a value of 1 in cell B13
I placed the cursor on cell C15, clicked on Tools, then Goal Seeking, then
entered 0.559 for target,
For “By changing cell” I entered B13
then clicked OK, then OK when there was a result
Excel has a trial and error process to find a value of B13 such that the
predicted probability = mean probability
or pretty close
Now that I have the quasi-intercept representing the effects
I copy the formula in cell C15 down to the next 2 cells, and have the
predicted probabilities for my 2 groups
The odds ratios etc. are similar in meaning t0 the actual ones discussed
above.
In this example, we can conclude that there is a 34 percentage point
difference in the probability of savings between those with rules and those
without rules, but after we control for the effects of other variables the
difference narrows to 23 percentage points — still very substantial
See the conclusions of the article linked above to my discussion of that
result (that was one of my contributions to this article….)
*In a logit, at the mean value of all independent
variables, the predicted
value of the log odds = the sample mean value of the log odds, but that is
not equivalent to the mean probability.