[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Questions about svy commands
Ana Gabriela Guerrero Serdan <firstname.lastname@example.org>
Re: st: Questions about svy commands
Sun, 10 Feb 2008 04:12:13 -0800 (PST)
> 1. Are the existing weights appropriate for the
> children? To answer
> this I would need more information about the survey.
> How did children
> get into the sample? As part of selected
Children got into the sample as part of selected
households. PSUs were selected with linear systematic
pps sampling. Stratification was done at the regional
level and for urban/rural areas.
> > or is the weight the same for all members of the
> Were the data post-stratified in any way? If there
> is just one weight
> for all members of the household, then use that.
I have only one weight for all households (same for
all members in the same household). So, I will use
> 2. Do you select just the children for an analysis
> data set, or do
> you analyze the entire set and use the -subpop-
I have selected children below 5 years old because my
analysis is only related to children <5. I have not
used subpop for my regressions.
> approach is the only one which will provide entirely
> correct standard
> errors, although often there will be little
> difference. Austin
> Nichols showed how to create a data set for use with
> the -subpop-
> option that will be only a little larger than one
> containing only
> children. See:
> msg00810.html .
Thanks, will take a look at Austins. Since my analysis
is at the individual level if I keep all the sample.
Then I will have more than 150,000 obs from which I
really need is around 15,000. But I guess that this
wont be a major problem if I increase the memory.
Another issue is that I also run regressions for
different groups within my sub-sample of children. So
I have split children into 4 cohorts and in the areas
where they reside. So my regressions are a sub-sub of
the whole sample. This sounds confusing but the idea
is to use a diffs-diffs estimators. So I compare
cohorts in different areas.
> 3. Although -svy- does not work with -areg-, you can
> use -areg- with
> a -weight- option and with the proper PSU as the
> cluster variable.
> You will be unable to use the -strata- option, and
> this could
> potentially lead to estimated standard errors that
> are larger than
> the true ones. It will also artificially increase
> the degrees of
> freedom for error. You can get around these by
> adding dummy
> variables for stratum into your model. If strata are
> defined by your
> “province” variable, then you have effectively done
They are difined by province but also by urban/rural.
So I would need to include dummies for urban areas as
well, I suppose?
I have not thought about using pweight with areg. I
assume I could also use xi:regress y x
i.dummiesprovince i.cohortbirth urban [pweight=z]
> 4. If there are too many strata to add as dummies
> (and strata are not
> defined by your provinces), ignore the strata in the
> analysis, but
> adjust the degrees of freedom by hand. The proper
> degrees of freedom
> for error will be the listed d.f. minus the number
> of strata. You can
> compute correct confidence intervals, say 95%
> intervals, as follows:
> 4.1. Find the error degrees of freedom from the
> -areg- output WITH
> the the -cluster- option. Suppose it is, df1 = 180.
> If you had 80
> strata, the degrees of freedom should be df2 =180 -
> 80 = 100.
> 4.2. With 180 degrees of freedeom, the t-multiplier
> for a standard
> error would be 1.973, but this is too small. Compute
> the t-multiplier
> for the correct degrees of freedom and 95% CI as
> (100,.025), or 1.9840.
> 4.3. You should INCREASE the nominal confidence
> level for -areg-, so
> that the t-multiplier with 180 d.f. is 1.9840. What
> should the level
> be? First find: ttail(180,1.9840), or 0.02439.
> The proper -level-
> is then: 1- 2x.02439=0.951. So you should specify a
> statement as “set level 95.12”.
> You can find the proper level in one step by:
> di 1-2*ttail(df1 , invttail(df2,.025))
> //finds level where df1 is the nominal degrees of
> freedom and df2 is
> the actual degrees of freedom =df1- n. strata.
This is very useful to know. I need to study this
Thanks for your very explicit and clear answers.
> On Feb 9, 2008, at 7:04 AM, Ana Gabriela Guerrero
> Serdan wrote:
> > Dear all,
> > Sorry for these probably obvious questions. Have
> > looked into the archives but I'm still confused
> > the following issues:
> > 1) I am using survey data (two-stages with
> > stratification). I am looking at children less
> > five years old. Can I apply svy set as usual to
> > sub-sample of children as follows?
> > svyset [pweight= expweigh], strata(AI05) psu(
> > 2) I had initially done my analyis with linear
> > ressions without the svyset, controlling for
> > differences in provinces and cohorts, and
> > at the district level. I used areg as follows:
> > areg Y X DummiesProvinces, vce(cluster district)
> > absorb(mdate)
> > What command can I use if I first set my data for
> > svyset?
> > Gaby Guerrero Serdan
> > Deparment of Economics
> > Royal Holloway, University of London
> > TW20 OEX
> > Egham, Surrey
> > England, UK
> > http://www.flickr.com/photos/49939890@N00/show/
> > Tel: +44 7912657259
> > ______________
> > Be a better friend, newshound, and
> > know-it-all with Yahoo! Mobile. Try it now.
> > *
> > * For searches and help try:
> > *
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> * For searches and help try:
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
Gaby Guerrero Serdan
Deparment of Economics
Royal Holloway, University of London
Tel: +44 7912657259
Never miss a thing. Make Yahoo your home page.
* For searches and help try: