Reading UN World Population Prospects Data into Stata

I’m not a fan of manual work that can be automated. Manually processing Excel files in Stata is a drag, though, particularly when those files are designed, in part, to present data in an attractive way.

This script addresses a couple irritating issues one would otherwise have to address manually:

  1. Stata lets you import data starting at the ith line, but it only imports variable names from the 1st
  2. Stata abhors (forbids, even) variable names starting with a numeral or containing a hyphen.

Note that I’ve only tested this with fertility data (TFR, ASFRs, natural increase), but it works with all of them and a cursory look around makes me think it could work for any of the WPP files without too much adjustment.

Here goes:

local here = "" // whatever path we've chosen
local excelfile = "WPP2015_FERT_F04_TOTAL_FERTILITY.XLS" // in my case...
cd `here'
// import the excel file, dumping the first 16 (noninformative) rows
import excel "`here'/`excelfile'", sheet("ESTIMATES") cellrange(A17:R258) clear
// export a temporary file
export delimited using "`here'/temp.txt", delimiter(tab) novarnames replace
// import our temporary file and get the variable names (partly) correct
import delimited `here'/temp.txt, delimiter(tab) varnames(1) clear
drop Variant // we don't care
foreach v of varlist v* {
local vl : variable label `v'
di "`vl'"
local vfix = subinstr("`vl'","-","_",.)
local vfix = "y`vfix'"
di "`vfix'"
rename `v' `vfix'
}
// end foreach
compress

(and then save the result)

Not perfect, but gets you through most of the irritating stuff. If only Stata let you -import- Excel files from URLs!

Leave a Reply

Your email address will not be published. Required fields are marked *