This post is deprecated: this post does a far better job.
This isn’t a difficult task, but I do it rarely enough that it’s not solidly lodged in memory.
I want to merge some UN TFR data with what I’m already working on. I can get Excel files from the UN World Population Prospects pages, but these aren’t exactly pret-a-porter for Stata. For one thing, they’ve got a bunch of rows I don’t care about at the top. I’m also working with time period as my unit of analysis, not country as the UN data are arranged.
Small matter! Let’s…
- download the TFR file in Excel format,
- open it in Stata,
- toss the rows at the top,
- treat the data with variable labels as such,
- rename the period variables,
- save the country-level file in Stata (for good measure),
- reshape into a period-level file.
Download the Excel file: Download the file (link to UN World Pop Prospects fertility files above) and open it to see how where the variable names row starts. In my case, the stuff I care about starts on the 17th row, so I want to ignore the 1st-16th rows.
Import the Excel file: This is the sort of command that I don’t bother to remember, and instead I just rely on Stata’s menus. Fill in the relevant options — start paying attention on the 17th row, use the first row for variable names, make those variable names lowercase — and submit the command, then grab it from Stata’s Results window for use in your do-file.
Rename the period variables: note that Stata variable names cannot begin with a numeral, but Stata’s very polite during -import-. It brings in variables with nonconforming names and gives them a name via a default scheme (alphabetical). My loop in the code below renames each of these variables for the p(eriod) starting with the earliest year specified.
Reshape into a period-file: The UN file comes with an ‘index’ variable; otherwise we’d need to
gen index=_n
. By naming the period- variables p’year’, our task is easy.
import excel ///
"/PATH/WPP2012_FERT_F04_TOTAL_FERTILITY.xls", ///
sheet("ESTIMATES") cellrange(A17:Q253) firstrow case(lower) clear
local alph "f g h i j k l m n o p q"
local baseyear 1950
foreach a of local alph {
rename `a' p`baseyear'
local baseyear = `baseyear' + 5
}
rename countrycode un_cc
save "/PATH/un_tfr.dta", replace
reshape long p, i(index) j(year)