[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: implementation of variance formula

From   Inna Becher <>
Subject   Re: st: implementation of variance formula
Date   Fri, 13 Mar 2009 16:55:49 +0100

Dear Stas,

the variance formula I use is

var=sum over k=1<k sum over m=1<k y[k]*y[m]*(p[km]-p[k]*p[m])/(p[k]*p[m]*p[km]),

where y[k] is the variable of interest in the network k, y[m] is the same for network m, the summation is over distinct networks represented in the sample, p[m] is the probability of the network m to be included in the variance estimator an p[km] is the pair probability of selection. I'm not very familiar with Mata yet. But my instinct says: the variance computation is best to made by means of Mata... My dataset is a population data, so I do not really have to simulate data, but I'm using -simulate- to draw 1000 replications of a sample and storing means and variances (the variances mentioned above are not yet implemented... )

Stas Kolenikov schrieb:
Tell us more about the problem. As far as I know, sampling networks is
heck of a mess. To simulate anything, you would need to have almost
perfect understanding how your network was formed. Any simulation is
just as good as the model to create the data that was used in that
simulation. And survey bootstrap is a moderately crazy topic. No
textbook covers it sufficiently well, unfortunately. Certainly not in
Efron's book; there is a chapter in Shao & Tu (1995) Springer book,
but it only covered stuff until late 1980s. The newer (and important!)
methods are only out there in the papers.

Yates-Grundy-Sen variance estimator for Horvitz-Thompson estimator is

sum over j<k (p[j]*p[k] - p[j,k])  (y[j]/p[j] - y[k]/p[k])^2

If you can write Mata functions to compute the unit and pair
probabilities of selection, you can have a pretty compact code for
your variance estimator. You won't have to store the huge matrices of
pairwise selection probabilities that likely have well structured form
if you talk about cluster sampling.

On Wed, Mar 11, 2009 at 3:49 AM, Inna Becher
<> wrote:
I can calculate the probability for each network (=cluster) to be included
in the sample. I also can
calculate for each pair of selected clusters to be included in the sample.
My problem is: this probabilities are to be saved somewhere. Should it be a
matrix? I have not yet worked with matrices to calculate variances. The
version of H-T-estimator I need is not implemented in svy-.
I wrote an ado for sampling design that I need and implemented H-T-estimator
for the mean, but not for the variance.

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index