You are welcome, Arka. The 50% RSE criterion I've seen is a worst case; 30% would be more believable. Steve On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: > Dear Steve, > > Thanks for all your suggestions. I have already ensured that I > have adequate number of observations in each district-industry cell. I > will also look at the relative standard error criterion.Once again > thanks a lot for your help. > > Regards, > Arka > > > > > Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Well, there will be numbers for up to 196,000 cells. many will be >> empty because of missing data; I would hesitate to call the remainder >> "estimates' unless the standard errors are reasonable and they were >> based on >10 -20 observations in the category. >> >> I have seen designs in which sum-of-weights estimates were worthless >> for estimating population totals, even with large sample sizes. PPS >> designs are less vulnerable to this kind of problem. >> >> Survey organizations generally have policies for suppressing >> estimates based on small sample sizes. Perhaps there is a standard >> practice in your field. I suggest that, in each district, you screen >> the industries present in the sample for a minimum number of >> individuals, say 10-20, and report proper survey estimates, with >> standard errors, and sample n's only for those. You can group smaller >> industries groups to meet these criteria.. The relative standard >> error (SE/estimate) x 100% is another criterion people use for >> suppressing estimates, and I've seen RSE's of 50% used as a maximum. >> >> Good luck! >> >> Steve From: Arka Roy Chaudhuri <gabuisi@gmail.com> >> Date: Fri, Sep 24, 2010 at 4:03 PM >> Subject: Re: st: R: Estimating the number of workers in each industry >> in each district - flag: Stata 9/2 SE >> To: statalist@hsphsun2.harvard.edu >> >> >> Dear Steve, >> >> Thanks a lot for all your advice.The problem is that in my dataset >> I have about 490 industries and 400 districts. Both industries and >> districts come with a code identifying them.I used the following >> command to estimate the number of workers in each industry in a >> district: >> >> bysort districtid industryid:egen workers=total(weight) /*here weight >> represents the inverse of probability of the household being sampled*/ >> duplicates drop districtid industryid,force >> keep districtid industryid workers >> save"T:\arka\industry_district.dta", >> >> >> Is the above estimation strategy leaving aside the issue of -svyset- >> my data? Please advice. >> >> Arka >> >> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> My advice about handling household counts of workers was wrong. Do not expand. >>> >>> Say you have counts for the number of workers in the hh in three industries >>> >>> n_agriculture >>> n_service >>> n_sales >>> >>> Then you would use do a separate command for each industry, for example: >>> ********************************************* >>> levelsof district, local(districts) >>> foreach x of local districts{ >>> svy: total n_agriculture if district==`x' >>> } >>> *********************************************** >>> You would use this form rather than an -over()- or -subpop()- option, >>> because districts are sampling strata. >>> >>> -Steve >>> >>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> Arka- >>>> >>>> Based on your description, you would -svyset- your data as follows: >>>> >>>> Define a variable (call it "psu" for "primary sampling unit") which is >>>> the village number (rural sector) or urban block( urban sector) >>>> >>>> >>>> then >>>> ******************************************************** >>>> svyset psu [pw = your weight], strata(district) >>>> *********************************************************** >>>> >>>> If your data has one line per person, with "industry" categorized >>>> >>>> then the command for totals might be >>>> >>>> ***************************************************** >>>> svy: tab district industry, count se format(%10.0fc) >>>> ***************************************************** >>>> >>>> If your data has only counts of workers in each industry in each HH, >>>> then you should -expand- the data first so that it has one line for >>>> each worker in the HH, e.g. >>>> >>>> ************* >>>> expand hhsize >>>> ************* >>>> >>>> (but that might include children, so you will have to take some care) >>>> >>>> Now a word of advice. It is easy to go wrong in a survey analysis. As >>>> you are a student, I suggest that you seek guidance from a faculty >>>> member who is experienced in surveys, if not in Stata. (I know that >>>> the Department of Statistics at UBC has a survey sampling course). I >>>> also suggest that you obtain a text to learn about sampnling, such as >>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also >>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and >>>> Berglund (2010); it uses Stata almost exclusively for its examples. >>>> >>>> Best wishes, >>>> >>>> Steve >>>> >>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>>> Hi, >>>>> Thanks for the help. In my dataset all the districts in the target >>>>> population are include. The sampling design is stratified multi-stage >>>>> design with the first stage units being villages in the rural sector >>>>> and urban blocks in the urban sector. The ultimate stage units (USU) >>>>> are households in both the sectors. >>>>> >>>>> I only have one set of weights that comes with the data. The >>>>> documentation states that the weights represent the probability that >>>>> the particular household was included in the sample. Please let me >>>>> know if I should include any other information. I am really thankful >>>>> for all the help. >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Arka >>>>> >>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>> >>>>>> Arka- >>>>>> >>>>>> I can't answer without more information about the sampling design. >>>>>> Please describe the design in detail, including answers to the >>>>>> following questin.. >>>>>> >>>>>> 1. Were all districts in the target population included in the sample? >>>>>> Or, were districts sampled? >>>>>> >>>>>> 2. Are the final sampling weights the probability sampling weights? Or >>>>>> was there adjustment to the probabilithy weights (post-stratification, >>>>>> "raking") so that the sample results will better reflect population >>>>>> census proportions? If the weights are so adjusted, are the original >>>>>> sampling weights available to you? >>>>>> >>>>>> >>>>>> Steve >>>>>> >>>>>> Steven J. From: Arka Roy Chaudhuri <gabuisi@gmail.com> >>>>> Date: Wed, Sep 15, 2010 at 4:07 AM >>>>> Subject: Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE >>>>> To: statalist@hsphsun2.harvard.edu

