the linearity of the expectation operator

It was the second day of Quantitative Political Analysis 3: Causal Inference, and I heard a phrase that I am not used to in mathematics “just take that as given.” I was shocked. You would almost never hear that in a math class, unless it was assumed as base knowledge (aka you had painstakingly proved it in another class…whether you actually had or not).

However, knowing the proof was relatively straightforward, since I actually had done it before, I decided I was just going to do it again “for fun.”

Now, what exactly was the political science context of this probability proof that managed to sneak it’s way into day two of the course? Well, we were trying to show:

Why? Because in the potential outcomes model, you assume for each individual that there is an outcome Y1 and Y0 for each individual. With the subscript 1 and 0 referring to whether or not an individual was treated (1) or untreated (0). For each individual there exists a counterfactual, aka in a different world what would have happened had you gotten the opposite treatment.

The whole ball game in causal inference is estimating the average causal effect, or what was the effect of the treatment. In the “real-world” you can never know the actual effect of treatment, because an individual can not be both treated and untreated (this is possible if we use time based identification strategies, but then we bring in another dimension and are considering the effect of treatment over time, not at a single point).

So, a simple example, consider you want to estimate the causal effect of door knocking on voting. For an individual you can not measure both the effect on voting if you knock their door and if you don’t knock their door, because you can’t both knock and not knock their door! It isn’t possible. This situation is often referred to as the fundamental problem of causal inference. Notably, if one does knock on someone’s door the individual can still choose to vote or not, as illustrated below.

Circling back to the fundamental problem, all is not lost, because you can infer this average causal effect given a sample of say 100 people.

This is where the linearity of the expectation operator comes in handy…

Because if we can prove that…

…then this leads us to be able to take the expected value of all the treated individuals (D=1) and all the untreated individuals (D=0) and infer the average causal effect from that using a simple mean! See the table below.

For i=1, the individual’s door was not knocked on (D=0), but they did wind up voting (as Y1=1).

For i =2, the individual did have their door knocked on (as D=1), and they did wind up voting (Y1=1), we can not know if they would have voted given their door wasn’t knocked on (Y0=?).

So, let’s say you summed up everything in the Y0 column that wasn’t a ?, divided by the number of entries you had, and you got .2, and for the Y1 column you got .8. You could infer that the average causal effect of door knocking was .6! Because E[Y1-Y0] = E[Y1]- E[Y0]= .8-.6 =.2. And thus door knocking had a positive effect on individuals voting.

Now onto the proof!

And there you go! We have shown that (for discrete Y1 Y0, the proof for continuous variables if very similar, just use integrals!):

Note, this proof is generalizable to:

As it doesn’t matter if we are doing subtraction or addition. Notably, this relationship does NOT hold for multiplication (it requires independence). Which leads me to my last point for this post. Nowhere in the proof above did we require Y0 to be independent of Y1.

Moving away from this example, consider rainfall, if you are looking for the expected value of rain over the weekend it would be, E[rain on Saturday + rain on Sunday] = E[rain on Saturday] + E[rain on Sunday]. Which is slightly counter intuitive, as if it rains on Saturday night any rain on Sunday morning would not be independent of the rain on Saturday, and yet we can still just sum the components (E[rain on Saturday] + E[rain on Sunday]) to find the expected value for the weekend!

And that’s it for this first content post! Should I have be working on my abstract to submit to APSA? Definitely.

Ohio State nav bar

the linearity of the expectation operator

Leave a Reply Cancel reply