Make use of UN TFR data in Stata

This post is deprecated: this post does a far better job.

This isn’t a difficult task, but I do it rarely enough that it’s not solidly lodged in memory.

I want to merge some UN TFR data with what I’m already working on. I can get Excel files from the UN World Population Prospects pages, but these aren’t exactly pret-a-porter for Stata. For one thing, they’ve got a bunch of rows I don’t care about at the top. I’m also working with time period as my unit of analysis, not country as the UN data are arranged.

Small matter! Let’s…

  • download the TFR file in Excel format,
  • open it in Stata,
  • toss the rows at the top,
  • treat the data with variable labels as such,
  • rename the period variables,
  • save the country-level file in Stata (for good measure),
  • reshape into a period-level file.

Download the Excel file: Download the file (link to UN World Pop Prospects fertility files above) and open it to see how where the variable names row starts. In my case, the stuff I care about starts on the 17th row, so I want to ignore the 1st-16th rows.

Import the Excel file: This is the sort of command that I don’t bother to remember, and instead I just rely on Stata’s menus. Fill in the relevant options — start paying attention on the 17th row, use the first row for variable names, make those variable names lowercase — and submit the command, then grab it from Stata’s Results window for use in your do-file.

Rename the period variables: note that Stata variable names cannot begin with a numeral, but Stata’s very polite during -import-. It brings in variables with nonconforming names and gives them a name via a default scheme (alphabetical). My loop in the code below renames each of these variables for the p(eriod) starting with the earliest year specified.

Reshape into a period-file: The UN file comes with an ‘index’ variable; otherwise we’d need to

gen index=_n

. By naming the period- variables p’year’, our task is easy.

import excel ///
	"/PATH/WPP2012_FERT_F04_TOTAL_FERTILITY.xls", ///
	sheet("ESTIMATES") cellrange(A17:Q253) firstrow case(lower) clear

local alph "f g h i j k l m n o p q"

local baseyear 1950
foreach a of local alph {
	rename `a' p`baseyear'
	local baseyear = `baseyear' + 5
}

rename countrycode un_cc

save "/PATH/un_tfr.dta", replace

reshape long p, i(index) j(year)

Leave a Reply

Your email address will not be published. Required fields are marked *