I’m not a fan of manual work that can be automated. Manually processing Excel files in Stata is a drag, though, particularly when those files are designed, in part, to present data in an attractive way.
This script addresses a couple irritating issues one would otherwise have to address manually:
- Stata lets you import data starting at the ith line, but it only imports variable names from the 1st
- Stata abhors (forbids, even) variable names starting with a numeral or containing a hyphen.
Note that I’ve only tested this with fertility data (TFR, ASFRs, natural increase), but it works with all of them and a cursory look around makes me think it could work for any of the WPP files without too much adjustment.
Here goes:
local here = "" // whatever path we've chosen local excelfile = "WPP2015_FERT_F04_TOTAL_FERTILITY.XLS" // in my case... cd `here' // import the excel file, dumping the first 16 (noninformative) rows import excel "`here'/`excelfile'", sheet("ESTIMATES") cellrange(A17:R258) clear // export a temporary file export delimited using "`here'/temp.txt", delimiter(tab) novarnames replace // import our temporary file and get the variable names (partly) correct import delimited `here'/temp.txt, delimiter(tab) varnames(1) clear drop Variant // we don't care foreach v of varlist v* { local vl : variable label `v' di "`vl'" local vfix = subinstr("`vl'","-","_",.) local vfix = "y`vfix'" di "`vfix'" rename `v' `vfix' } // end foreach compress
(and then save the result)
Not perfect, but gets you through most of the irritating stuff. If only Stata let you -import- Excel files from URLs!