I’m working in SAS again, after about eight gleeful years. To be sure, I didn’t miss it, but I’d forgotten all the reasons why.
My data is in SAS, with a format catalog formats.sasb7cat. I want a fully-labeled Stata version of that data. What to do?
The following takes a SAS file and format catalog that reside in the same directory, spits out the format catalog to Stata (you’ll need to export your data in a separate step), turns the data file into a text file containing all the -label define- and -label values- statements that will reproduce the value labels present in the SAS file.
In SAS:
libname mine "c:\temp\sasxport";
libname library "c:\temp\sasxport";
proc format lib=library cntlout=fmtdata;
run;
proc export data=fmtdata
FILE="lbls_converted.dta"
DBMS=STATA replace;
run;
Now, escaping back to Stata. The following is a supremely lazy way of approaching the problem, but it gets the job done. For one thing, I use -preserve- and -restore- to drop all but the line in which I’m working.
use lbls_converted.dta, clear // my output from SAS
local labeldo "convert_dict.do" // this is the do-file that'll contain your generated -label- statements
tempname outfile
file open `outfile' using `labeldo', write replace
// if you've got anything "weird" in your data you want to drop, do it before this line...
qui count
local nobs=`r(N)'
by fmtname, sort: gen rowindex=_n
foreach v of varlist _all { // I had a bunch of leading / trailing whitespace -- YMMV
cap noi replace `v' = trim(`v')
}
// this may be a peculiarity of my situation -- some, but not all, formats end in _F
// -regexr- would be smarter here, as it could trim only the *final* instance
gen vname = subinstr(fmtname,"_F","",1) // replace one instance of _F
forvalues row = 1/`nobs' {
preserve
qui keep in `row'
local vname = vname
local valname = start
local vallab = subinstr(label, "''","",.)
if rowindex==1 {
*if "`labelline'"!="" file write `outfile' "`labelline'" _n
if "`varline'"!="" file write `outfile' _n "`varline'" _n
local varline "label values `vname' `vname'"
file write `outfile' "label define `vname' "
}
* local labelline "`labelline' `valname' `vallab'"
file write `outfile' `" `valname' "`vallab'""'
restore
}
// end foreach
file close `outfile'
use "data_converted.dta", clear
do `labeldo'
How might this fall apart? Easily. For example, for SAS FORMATs might not be named the same as your variables. I’m sure there’s a step that could solve this easily enough, but I’m fortunate to not have had to deal with it.