Untitled Document

SOME CONTRIBUTIONS TO
STATISTICAL DISCLOSURE CONTROL

Akademisk avhandling
som för avläggande av filosofie doktorsexamen
vid Stockholms universitet
offentligen försvaras i
Högbomsalen, Geovetenskapens hus, Frescati
fredagen den 17 januari 2003 kl 10.00
av
Michael Carlson
fil lic

Abstract

An important issue associated with the release of statistical data, is the possibility of disclosing individual information about respondents. Statistical disclosure control (SDC), is the discipline that deals with methods of producing statistical data that are safe enough to be released while retaining its analytical value and also methods of assessing the disclosure risks. This thesis deals with both aspects.
In the first paper, a method for limiting disclosure risks in microdata (individual data) is described. The method is a variant of so-called data-swapping and is intended to be applied to quantitative data and is based on the rank structure of the original data. Theoretical results and simulation studies indicate that the method performs at least reasonably well when applied to bivariate normal data.
An important measure of identification risk associated with the release of microdata or large complex tables is proportion of population units that can be uniquely identified by a set of matchable attributes. In the second paper a model based on the Poisson-inverse Gaussian distribution is proposed as a possible approach within this context. Disclosure risk measures are discussed and derived under the proposed model as are various methods of estimation. The results indicate that the model may be a useful and analytically tractable alternative to other models.
The third paper reports the results of an empirical comparison between different methods of assessing file-level disclosure risk as measured by the estimated number of unique population units amongst unique records and the number of unique units in the population. The results indicate that no one model or method performs uniformly best and that performance varies greatly between different types of data.
The fourth and last paper presents a method for assessing a per-record measure of disclosure risk based on a Poisson-inverse Gaussian regression model. Per-record measures may be used to identify sensitive (atypical) records in a file which can be modified separately using SDC techniques prior to the release. The method builds on loglinear modeling and is exemplified using both sample and population level information. The results indicate that the model provides a tractable alternative to the Poisson-lognormal model and that using population level information sharpens the measure.

-------------------------------------------------