Akademisk avhandling
som för avläggande av filosofie
doktorsexamen
vid Stockholms universitet
offentligen försvaras i
hörsal 1, hus A, södra huset,
Frescati
måndagen den 13 december 1999
kl 10.00
av
Forough Karlberg
fil lic
Statistiska institutionen
Stockholms universitet
ABSTRACT
Estimation of the population total of a highly skewed survey variable
from a small sample is problematic if straightforward methods are used
since (i) when there are no extreme values in the sample, too small estimates
will be obtained (ii) if extreme values are sampled, the estimates will
become grotesquely large. Traditional methods for outlier treatment will
usually compensate for outliers in the sample, thereby avoiding (ii), whereas
the small negative bias of (i) will persist. Here, a lognormal superpopulation
model is proposed. A particular strength of the lognormal model estimator
is that even in the absence of extremely large values in the sample, the
assumed lognormal structure of the survey variable is used for estimating
the population total.
Two estimators based on a lognormal superpopulation distribution are proposed: (i) one estimator applicable if the shape parameter of the assumed lognormal superpopulation distribution is known (ii) one estimator applicable if the shape parameter is unknown. For both estimators, any number of auxiliary variables can be utilized. Estimator (i) is of little practical importance, but has the advantage that it is model unbiased, and that a model unbiased estimator of its estimation error variance also easily can be derived. Estimator (ii), although only approximately model unbiased, is more practically applicable, because of the more realistic assumption of unknown shape parameter.
Both estimators (i) and (ii) are applicable only for variables that are strictly positive. A third estimator, based on a combined lognormal-logistc superpopulation model is therefore proposed; this estimator can be applied to situations in which the survey variable, while highly skewed, may assume the value zero for a number of units.
The three model-based estimators are compared to a number of alternative estimators (design-based estimators as well as estimators specifically constructed for outlier treatment) in a simulation study, using random populations as well as real survey populations. The simulation results give at hand that the model-based estimators constitute a sensible alternative to the alternative estimators, in particular when the sample size is small and when the distribution of the survey variable is close to the assumed superpopulation distribution.
Key words: Extreme values, skewed populations, model-based inference, superpopulation, lognormal distribution.