# st: question on bootstrapping of proportions

 From Gijs Dekkers To statalist@hsphsun2.harvard.edu Subject st: question on bootstrapping of proportions Date Fri, 18 Mar 2005 11:31:58 +0100

Dear all,

Let me start by admitting that I am a complete novice in STATA (I am working with it for 4 days now).

I have a dataset consisting of a household-identification number, a year-identification number (1994 to 2002), income and a frequency weight variabele. So, if household number 1 occurs in the dataset in every year, the set contains

hh-number year income weight

1 1994 1,200 1
1 1995 1,268 2
1 1996 1,466 2
1 1997 1,560 2
and so forth up to 2002
2 1994 2,6 1
...

Now a classical poverty measure quite popular here in Europe is the headcount ratio. One is poor if one's income is below 60 percent of the median income. In stata-syntax

by jaar: summarize income [aweight=weight],detail
by jaar: gen byte poor=income < 0.6*r(p50)
by jaar: tabulate poor [aweight=weight]

The result then is, for instance, that the percentage of poor (i.e. the percentage of those having an income below 60% of the median) is 15 in 1994, 16.5 in 1995, 18 in 1996 and so forth.

Here's my first problem: I want to test for the difference in these percentages between consequtive years .

In the reference manual on page 120, an example is given on how to use bootstrapping to test for the difference between means. Can anyone provide me an example for the above problem, i.e. for testing the difference between proportions?

I myself have tried to bootstrap the above, as follows:

bootstrap "by year: summarize income,detail" poor=(income < 0.6*r(p50)), reps(100) strata(year) saving(bsres) replace
However, there are two additional problems with this:

1. How can I include the WEIGHTED summarize? If I just copy the [aweight-...] option into the bootstrap-command, I get an error message telling me that this is not allowed.
2. I suspect (but am not sure) that the above command does not do what I want. I fear that it calculates separate medians for every value of 'year' and for every draw taken from the dataset. In other words, I think that for every of 1000 bootstrap-draws, medians are calculated for every value of 'year'. This is not what I want.

I want separate bootstraps taken for every value of 'year' (as if I have separate datasets for every value of year). However, the command

by jaar: bootstrap "summarize income"....

does not work. So, do I actually have to create separate datasets for every value of 'year' and then run the bootstrap-command on everyone of them? Or is there an easier and more elegant solution?

Sorry for the long mail. Any help will be much appreciated.

Gijs Dekkers

--
Dr. Gijs Dekkers
Federaal Planbureau
Algemene Directie
Kunstlaan 47-49
B 1000 Brussel
++32/(0)2/5077413
fax 7373

**********************************************************************
Disclaimer: This e-mail may contain confidential information
which is intended only for the use of the recipient(s) named above.
notify the sender immediately and delete this e-mail from