Dear Steven, > 1. Are the existing weights appropriate for the > children? To answer > this I would need more information about the survey. > How did children > get into the sample? As part of selected > households? Children got into the sample as part of selected households. PSUs were selected with linear systematic pps sampling. Stratification was done at the regional level and for urban/rural areas. > > or is the weight the same for all members of the household? > Were the data post-stratified in any way? If there > is just one weight > for all members of the household, then use that. I have only one weight for all households (same for all members in the same household). So, I will use this. > > 2. Do you select just the children for an analysis > data set, or do > you analyze the entire set and use the -subpop- > option? I have selected children below 5 years old because my analysis is only related to children <5. I have not used subpop for my regressions. The second > approach is the only one which will provide entirely > correct standard > errors, although often there will be little > difference. Austin > Nichols showed how to create a data set for use with > the -subpop- > option that will be only a little larger than one > containing only > children. See: > http://www.stata.es/statalist/archive/2007-11/ > msg00810.html . Thanks, will take a look at Austins. Since my analysis is at the individual level if I keep all the sample. Then I will have more than 150,000 obs from which I really need is around 15,000. But I guess that this wont be a major problem if I increase the memory. Another issue is that I also run regressions for different groups within my sub-sample of children. So I have split children into 4 cohorts and in the areas where they reside. So my regressions are a sub-sub of the whole sample. This sounds confusing but the idea is to use a diffs-diffs estimators. So I compare cohorts in different areas. > > 3. Although -svy- does not work with -areg-, you can > use -areg- with > a -weight- option and with the proper PSU as the > cluster variable. > You will be unable to use the -strata- option, and > this could > potentially lead to estimated standard errors that > are larger than > the true ones. It will also artificially increase > the degrees of > freedom for error. You can get around these by > adding dummy > variables for stratum into your model. If strata are > defined by your > “province” variable, then you have effectively done > that. They are difined by province but also by urban/rural. So I would need to include dummies for urban areas as well, I suppose? I have not thought about using pweight with areg. I assume I could also use xi:regress y x i.dummiesprovince i.cohortbirth urban [pweight=z] > 4. If there are too many strata to add as dummies > (and strata are not > defined by your provinces), ignore the strata in the > analysis, but > adjust the degrees of freedom by hand. The proper > degrees of freedom > for error will be the listed d.f. minus the number > of strata. You can > compute correct confidence intervals, say 95% > intervals, as follows: > > 4.1. Find the error degrees of freedom from the > -areg- output WITH > the the -cluster- option. Suppose it is, df1 = 180. > If you had 80 > strata, the degrees of freedom should be df2 =180 - > 80 = 100. > > 4.2. With 180 degrees of freedeom, the t-multiplier > for a standard > error would be 1.973, but this is too small. Compute > the t-multiplier > for the correct degrees of freedom and 95% CI as > invttail > (100,.025), or 1.9840. > > 4.3. You should INCREASE the nominal confidence > level for -areg-, so > that the t-multiplier with 180 d.f. is 1.9840. What > should the level > be? First find: ttail(180,1.9840), or 0.02439. > The proper -level- > is then: 1- 2x.02439=0.951. So you should specify a > -level- > statement as “set level 95.12”. > > You can find the proper level in one step by: > > di 1-2*ttail(df1 , invttail(df2,.025)) > //finds level where df1 is the nominal degrees of > freedom and df2 is > the actual degrees of freedom =df1- n. strata. This is very useful to know. I need to study this closer. Thanks for your very explicit and clear answers. rgds, Gaby > -Steven > > On Feb 9, 2008, at 7:04 AM, Ana Gabriela Guerrero > Serdan wrote: > > > Dear all, > > > > Sorry for these probably obvious questions. Have > > looked into the archives but I'm still confused > on > > the following issues: > > > > 1) I am using survey data (two-stages with > > stratification). I am looking at children less > than > > five years old. Can I apply svy set as usual to > my > > sub-sample of children as follows? > > > > svyset [pweight= expweigh], strata(AI05) psu( > AI06) > > > > > > 2) I had initially done my analyis with linear > > ressions without the svyset, controlling for > > differences in provinces and cohorts, and > clustering > > at the district level. I used areg as follows: > > > > areg Y X DummiesProvinces, vce(cluster district) > > absorb(mdate) > > > > What command can I use if I first set my data for > > svyset? * > * For searches and help try: > * > http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * > http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ 