Abstract
With the release
of public-use microdata files it is important
to assess the risk of disclosing individual information. A measure of
disclosure risk often considered in the literature is the proportion of
unique records in the file that are also unique in the population. Various
methods based on superpopulation models have been proposed for estimating
this quantity using sample data. An empirical comparison of a selection
of
models applied to three real-life data sets is presented. The general
conclusion is that no one model is uniformly best with respect to the
risk
measure used and that performance varies greatly between different types
of
data.
Keywords: Method evaluation;
Statistical disclosure control;
Superpopulation; Uniqueness.
Michael
Carlson, Department of Statistics, Stockholm University,
SE-106 91 Stockholm, Sweden. E-mail: Michael.Carlson@stat.su.se
|