Produce a fully-labeled Stata dataset from SAS — mostly

I’m working in SAS again, after about eight gleeful years. To be sure, I didn’t miss it, but I’d forgotten all the reasons why.

My data is in SAS, with a format catalog formats.sasb7cat. I want a fully-labeled Stata version of that data. What to do?

The following takes a SAS file and format catalog that reside in the same directory, spits out the format catalog to Stata (you’ll need to export your data in a separate step), turns the data file into a text file containing all the -label define- and -label values- statements that will reproduce the value labels present in the SAS file.

In SAS:

libname mine "c:\temp\sasxport";
libname library "c:\temp\sasxport";

proc format lib=library cntlout=fmtdata;
run;

proc export data=fmtdata
	FILE="lbls_converted.dta"
	DBMS=STATA replace;
run;

Now, escaping back to Stata. The following is a supremely lazy way of approaching the problem, but it gets the job done. For one thing, I use -preserve- and -restore- to drop all but the line in which I’m working.

use lbls_converted.dta, clear // my output from SAS

local labeldo "convert_dict.do" // this is the do-file that'll contain your generated -label- statements

tempname outfile
file open `outfile' using `labeldo', write replace

// if you've got anything "weird" in your data you want to drop, do it before this line...
qui count
local nobs=`r(N)'

by fmtname, sort: gen rowindex=_n

foreach v of varlist _all { // I had a bunch of leading / trailing whitespace -- YMMV
	cap noi replace `v' = trim(`v')
}

// this may be a peculiarity of my situation -- some, but not all, formats end in _F
// -regexr- would be smarter here, as it could trim only the *final* instance
gen vname = subinstr(fmtname,"_F","",1) // replace one instance of _F

forvalues row = 1/`nobs' {
	preserve
	qui keep in `row'
	local vname = vname
	local valname = start
	local vallab = subinstr(label, "''","",.)
	if rowindex==1 {
		*if "`labelline'"!="" file write `outfile' "`labelline'" _n
		if "`varline'"!="" file write `outfile' _n "`varline'" _n
		local varline "label values `vname' `vname'"
		file write `outfile' "label define `vname' "
	}
	* local labelline "`labelline' `valname' `vallab'"
	file write `outfile' `" `valname' "`vallab'""'
	restore
}
// end foreach

file close `outfile'

use "data_converted.dta", clear
do `labeldo'

How might this fall apart? Easily. For example, for SAS FORMATs might not be named the same as your variables. I’m sure there’s a step that could solve this easily enough, but I’m fortunate to not have had to deal with it.

Leave a Reply

Your email address will not be published. Required fields are marked *