Use Stata to unzip a bunch of Demographic and Health Survey files and put them where I want them

The 2012 Indonesia Demographic and Health Survey data were released yesterday. There are a bunch of zip files to download, one for each of the survey components, and each of these zip files contains between zero and one files that colleagues and I want to use. Being lazy, I wanted to:

  1. Use Stata to do as much of the work for me, and
  2. do nothing manually.

I couldn’t use Stata to access the zip files, which require supplying credentials. But, I was able to use DownloadThemAll! for Firefox to grab all the zip files and save them to a single folder.

Next, in Stata:

set more off
capture log close
local sourcedir "O:/data/original/DHS"
local unzipdir "O:/data/original/TEMP"
cap noi mkdir "`unzipdir'"

/*
    First, let's unzip all the zip files we find
*/
local fls : dir "`sourcedir'" files "*.zip"
cd "`unzipdir'"
foreach f of local fls {
        di "Working on `f'"
        unzipfile "`sourcedir'/`f'", replace
}
/* Let's:
    1. Make the filename lowercase,
    2. strip the last two characters of the basename,
    3. infer the type of survey (individual, etc.) from the 3rd-4th chars,
    4. construct a new filename we like, and
    5. move the file where we want it
*/

local fls : dir "`unzipdir'" files "*.dta"
foreach f of local fls {
        local lname = lower("`f'")
        local stem = substr("`lname'", 1, 6)
        local surv_type = substr("`lname'",3,2)
        local fname = "`stem'.dta"
        copy "`unzipdir'/`f'" "`sourcedir'/`surv_type'/`fname'", replace
}

* Let's clean up the files that were unzipped
local fls : dir "`unzipdir'" files "*"
foreach f of local fls {
    erase "`unzipdir'/`f'"
}

* And clean up the unzipped files themselves
local fls : dir "`sourcedir'" files "*.zip"
foreach f of local fls {
    erase "`sourcedir'/`f'"
}

What does it do?

Given (1) a source folder where zip files are stored and (2) a folder into which to unzip them, the program goes file-by-file, unzipping each zip file it finds.

Next, it looks at the filename of each of the files with a ‘dta’ extension. It makes the whole filename lowercase (DHS uses uppercase, and that’s inconvenient), keeps only the first six characters (the ones that matter for end-user concerns!), makes an inference about the survey component (ir=individual recode, hr=household recode, cr=child recode, and so on), reconstructs the filename and copies the .dta file into a sensible place (in my world, subdirectories already exist for each of these survey types so I don’t have to create them with -mkdir-).

Finally, the program deletes all unzipped files (the stuff I care about has been copied elsewhere) as well as the original zip files.

Now I can use this for any DHS files with standard file format names. The next time a survey comes out, I just download all the zip files into a one folder, run my do-file, and relax.

2 thoughts on “Use Stata to unzip a bunch of Demographic and Health Survey files and put them where I want them

    • If you’ve created an account at dhsprogram.com, log in. Visit the page containing the datasets you want. Assuming you want more than one survey type (household, child, etc.), you can use one of the myriad multi-download browser plugins/extensions available. I’ve used DownloadThemAll in the past. Of course, if you’re looking to download lots of files from lots of survey pages, you’re getting into crawling the site, which is a different matter altogether and can’t be done solely with a multi-downloader. It also may violate DHS’ terms of service (I don’t know, as I’ve not checked, but it’s important to check beforehand and/or get consent).

Leave a Reply

Your email address will not be published. Required fields are marked *