Abstract
An important issue associated with the release of statistical data,
is the possibility of disclosing individual information about respondents.
Statistical disclosure control (SDC), is the discipline that deals
with methods of producing statistical data that are safe enough
to be released while retaining its analytical value and also methods
of assessing the disclosure risks. This thesis deals with both aspects.
In the first paper, a method for limiting disclosure risks in microdata
(individual data) is described. The method is a variant of so-called
data-swapping and is intended to be applied to quantitative data
and is based on the rank structure of the original data. Theoretical
results and simulation studies indicate that the method performs
at least reasonably well when applied to bivariate normal data.
An important measure of identification risk associated with the
release of microdata or large complex tables is proportion of population
units that can be uniquely identified by a set of matchable attributes.
In the second paper a model based on the Poisson-inverse Gaussian
distribution is proposed as a possible approach within this context.
Disclosure risk measures are discussed and derived under the proposed
model as are various methods of estimation. The results indicate
that the model may be a useful and analytically tractable alternative
to other models.
The third paper reports the results of an empirical comparison between
different methods of assessing file-level disclosure risk as measured
by the estimated number of unique population units amongst unique
records and the number of unique units in the population. The results
indicate that no one model or method performs uniformly best and
that performance varies greatly between different types of data.
The fourth and last paper presents a method for assessing a per-record
measure of disclosure risk based on a Poisson-inverse Gaussian regression
model. Per-record measures may be used to identify sensitive (atypical)
records in a file which can be modified separately using SDC techniques
prior to the release. The method builds on loglinear modeling and
is exemplified using both sample and population level information.
The results indicate that the model provides a tractable alternative
to the Poisson-lognormal model and that using population level information
sharpens the measure.
-------------------------------------------------
|