Introduction to 1992 SCF Codebook

Introduction to 1992 SCF Codebook



Surveys of Consumer Finances

Board of Governors of the Federal Reserve System

Mail Stop 180

Washington, DC 20551


To: Users of this tape

From: Arthur Kennickell, SCF Project Director

Date: November 29, 1994

Subject: Description of the First Public Release of the Full 1992 SCF Cross-Section Dataset


This codebook serves as the authoritative guide to the variables included on the current public use
version of the 1992 SCF dataset. However, not every variable included in this codebook is actually
in the public use dataset. A list of the variables included is given at the end of this documentation.

IMPORTANT NOTE: In this release, the values of investment real estate have been set to missing;
other selected variables for a subset of cases have also been set to missing. In the final release
(expected in early 1995), these variables will be included. Region (X30022) is not included on this
release, but our intention is to add it to the next release after a more detailed review of the data has
been performed.

For this release, the data have been systematically altered by several means to minimize the
possibility of identifying any survey respondent. For some discrete variables, small or unusual cells
were collapsed as noted in the variable descriptions below. Continuous variables were rounded.
Data were also blurred by other unspecified means. In addition, a number of other cases were
identified for more extensive treatment. Some of these cases were selected on the basis of extreme
or unusual data values. Other cases were selected at random. For each of these cases, a selection
of critical variables was set to missing and statistically imputed subject to constraints designed to
ensure that any distortions induced in key population statistics would be minimal. Aside from the
cell collapsing, there is no key in this codebook or on the tape that would allow users to identify
directly either which data items have been smoothed or otherwise altered, or which cases were
selected for imputation of critical values (that is, the shadow variables in this dataset may not always
reflect the true original status of every variable, though the change should have minimal effect on
most analyses). For further details on the procedures taken to protect the identity of respondents,
see “The Challenges of Preparing Sensitive Data for Public Release,” Gerhard Fries and R. Louise

Each case included in the public version of the dataset has been given a new identification number
(YY1), which is intended to mask the knowledge of which cases were drawn from the SCF list
sample. Under the original numbering system (XX1), the sample design is apparent from the
identification numbers. It should not be possible to know with certainty from the information
provided in the public version of this dataset which cases derive from the list sample. Because we
routinely use the original numbers internally, users who direct questions to us about specific cases
might want to be sure to emphasize that they are using the external ID number to avoid confusion.

The public use version of this version of the 1992 SCF survey is a SAS dataset of about 75
megabytes. As in the case of the 1989 survey, virtually every missing variable in the dataset has
been imputed, and every variable has a “shadow” variable that describes the original state of the
variable (i.e., whether it was missing for some reason, a range response was given, etc.). An
exception is reported values which have been imputed or otherwise altered for purposes of disclosure
avoidance. Such values are not flagged in any systematic way. Users who so desire may use the
shadow variables to restore the data to something very close to their original condition. The main
data values are stored using variable names corresponding to the numbers given in the codebook
below and having a prefix of “X.” The shadow variables have the same numbers, but with a prefix
of “J.” A list of the values of the shadow variables follows: