Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE |

Date |
Mon, 27 Sep 2010 09:18:22 -0400 |

You are welcome, Arka. The 50% RSE criterion I've seen is a worst case; 30% would be more believable. Steve On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: > Dear Steve, > > Thanks for all your suggestions. I have already ensured that I > have adequate number of observations in each district-industry cell. I > will also look at the relative standard error criterion.Once again > thanks a lot for your help. > > Regards, > Arka > > > > > Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Well, there will be numbers for up to 196,000 cells. many will be >> empty because of missing data; I would hesitate to call the remainder >> "estimates' unless the standard errors are reasonable and they were >> based on >10 -20 observations in the category. >> >> I have seen designs in which sum-of-weights estimates were worthless >> for estimating population totals, even with large sample sizes. PPS >> designs are less vulnerable to this kind of problem. >> >> Survey organizations generally have policies for suppressing >> estimates based on small sample sizes. Perhaps there is a standard >> practice in your field. I suggest that, in each district, you screen >> the industries present in the sample for a minimum number of >> individuals, say 10-20, and report proper survey estimates, with >> standard errors, and sample n's only for those. You can group smaller >> industries groups to meet these criteria.. The relative standard >> error (SE/estimate) x 100% is another criterion people use for >> suppressing estimates, and I've seen RSE's of 50% used as a maximum. >> >> Good luck! >> >> Steve >> >> Steven J. Samuels >> sjsamuels@gmail.com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ---------- Forwarded message ---------- >> From: Arka Roy Chaudhuri <gabuisi@gmail.com> >> Date: Fri, Sep 24, 2010 at 4:03 PM >> Subject: Re: st: R: Estimating the number of workers in each industry >> in each district - flag: Stata 9/2 SE >> To: statalist@hsphsun2.harvard.edu >> >> >> Dear Steve, >> >> Thanks a lot for all your advice.The problem is that in my dataset >> I have about 490 industries and 400 districts. Both industries and >> districts come with a code identifying them.I used the following >> command to estimate the number of workers in each industry in a >> district: >> >> bysort districtid industryid:egen workers=total(weight) /*here weight >> represents the inverse of probability of the household being sampled*/ >> duplicates drop districtid industryid,force >> keep districtid industryid workers >> save"T:\arka\industry_district.dta", >> >> >> Is the above estimation strategy leaving aside the issue of -svyset- >> my data? Please advice. >> >> Arka >> >> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> My advice about handling household counts of workers was wrong. Do not expand. >>> >>> Say you have counts for the number of workers in the hh in three industries >>> >>> n_agriculture >>> n_service >>> n_sales >>> >>> Then you would use do a separate command for each industry, for example: >>> ********************************************* >>> levelsof district, local(districts) >>> foreach x of local districts{ >>> svy: total n_agriculture if district==`x' >>> } >>> *********************************************** >>> You would use this form rather than an -over()- or -subpop()- option, >>> because districts are sampling strata. >>> >>> -Steve >>> >>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> Arka- >>>> >>>> Based on your description, you would -svyset- your data as follows: >>>> >>>> Define a variable (call it "psu" for "primary sampling unit") which is >>>> the village number (rural sector) or urban block( urban sector) >>>> >>>> >>>> then >>>> ******************************************************** >>>> svyset psu [pw = your weight], strata(district) >>>> *********************************************************** >>>> >>>> If your data has one line per person, with "industry" categorized >>>> >>>> then the command for totals might be >>>> >>>> ***************************************************** >>>> svy: tab district industry, count se format(%10.0fc) >>>> ***************************************************** >>>> >>>> If your data has only counts of workers in each industry in each HH, >>>> then you should -expand- the data first so that it has one line for >>>> each worker in the HH, e.g. >>>> >>>> ************* >>>> expand hhsize >>>> ************* >>>> >>>> (but that might include children, so you will have to take some care) >>>> >>>> Now a word of advice. It is easy to go wrong in a survey analysis. As >>>> you are a student, I suggest that you seek guidance from a faculty >>>> member who is experienced in surveys, if not in Stata. (I know that >>>> the Department of Statistics at UBC has a survey sampling course). I >>>> also suggest that you obtain a text to learn about sampnling, such as >>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also >>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and >>>> Berglund (2010); it uses Stata almost exclusively for its examples. >>>> >>>> Best wishes, >>>> >>>> Steve >>>> >>>> Steven J. Samuels >>>> sjsamuels@gmail.com >>>> 18 Cantine's Island >>>> Saugerties NY 12477 >>>> USA >>>> Voice: 845-246-0774 >>>> Fax: 206-202-4783 >>>> >>>> >>>> >>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>>> Hi, >>>>> Thanks for the help. In my dataset all the districts in the target >>>>> population are include. The sampling design is stratified multi-stage >>>>> design with the first stage units being villages in the rural sector >>>>> and urban blocks in the urban sector. The ultimate stage units (USU) >>>>> are households in both the sectors. >>>>> >>>>> I only have one set of weights that comes with the data. The >>>>> documentation states that the weights represent the probability that >>>>> the particular household was included in the sample. Please let me >>>>> know if I should include any other information. I am really thankful >>>>> for all the help. >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Arka >>>>> >>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>> >>>>>> Arka- >>>>>> >>>>>> I can't answer without more information about the sampling design. >>>>>> Please describe the design in detail, including answers to the >>>>>> following questin.. >>>>>> >>>>>> 1. Were all districts in the target population included in the sample? >>>>>> Or, were districts sampled? >>>>>> >>>>>> 2. Are the final sampling weights the probability sampling weights? Or >>>>>> was there adjustment to the probabilithy weights (post-stratification, >>>>>> "raking") so that the sample results will better reflect population >>>>>> census proportions? If the weights are so adjusted, are the original >>>>>> sampling weights available to you? >>>>>> >>>>>> >>>>>> Steve >>>>>> >>>>>> Steven J. Samuels >>>>>> sjsamuels@gmail.com >>>>>> 18 Cantine's Island >>>>>> Saugerties NY 12477 >>>>>> USA >>>>>> Voice: 845-246-0774 >>>>>> Fax: 206-202-4783 >>>>>> >>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>>>> > Arka wrote: >>>>>> > "Now I want to estimate the number of workers >>>>>> > belonging to each industry in a particular district" >>>>>> > >>>>>> > A quite trivial example about Arka's issue may be the following one (set >>>>>> > aside survey technicalities): >>>>>> > >>>>>> > ---------------------code begins------------------------------------ >>>>>> > drop _all >>>>>> > set obs 100 >>>>>> > g Workers=_n >>>>>> > g District="East" in 1/50 >>>>>> > replace District="West" in 51/100 >>>>>> > g Industry="Concrete" in 1/30 >>>>>> > replace Industry="Steel" in 31/100 >>>>>> > g A= 1 if District=="East" & Industry=="Steel" >>>>>> > g B= 1 if District=="West" & Industry=="Steel" >>>>>> > g C= 1 if District=="East" & Industry=="Concrete" >>>>>> > ---------------------code ends------------------------------------ >>>>>> > >>>>>> > HTH and Kind Regards, >>>>>> > Carlo >>>>>> > -----Messaggio originale----- >>>>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>>>> > Chaudhuri >>>>>> > Inviato: mercoledì 15 settembre 2010 9.24 >>>>>> > A: statalist@hsphsun2.harvard.edu >>>>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>>>> > district >>>>>> > >>>>>> > Dear All, >>>>>> > I have a data set which has information at the individual >>>>>> > level.I have variables which record the district of residence of the >>>>>> > individual, the industry of employment of the individual and other >>>>>> > demographic characterstics.The data set also comes with weights which >>>>>> > represents the probability that a particular household is included in >>>>>> > the sample.Thus all individuals belonging to a particular household >>>>>> > get the same weight.Now I want to estimate the number of workers >>>>>> > belonging to each industry in a particular district.Could anyone >>>>>> > please advice on the correct stata code that I should write to get my >>>>>> > desired estimates?Also I would be grateful if somebody could advice me >>>>>> > on the possible biases that might affect my estimates at the >>>>>> > industry-district level.I would really appreciate any help in this >>>>>> > regard.Thanks >>>>>> > >>>>>> > Regards, >>>>>> > Arka >>>>>> > -- >>>>>> > Arka Roy Chaudhuri >>>>>> > PhD Student >>>>>> > University of British Columbia >>>>>> > 997-1873 East Mall >>>>>> > Vancouver >>>>>> > Canada >>>>>> > Ph: +1 (604) 349-8283 >>>>>> > Email: gabuisi@gmail.com >>>>>> > >>>>>> > * >>>>>> > * For searches and help try: >>>>>> > * http://www.stata.com/help.cgi?search >>>>>> > * http://www.stata.com/support/statalist/faq >>>>>> > * http://www.ats.ucla.edu/stat/stata/ >>>>>> > >>>>>> > >>>>>> > * >>>>>> > * For searches and help try: >>>>>> > * http://www.stata.com/help.cgi?search >>>>>> > * http://www.stata.com/support/statalist/faq >>>>>> > * http://www.ats.ucla.edu/stat/stata/ >>>>>> > >>>>>> >>>>>> * >>>>>> * For searches and help try: >>>>>> * http://www.stata.com/help.cgi?search >>>>>> * http://www.stata.com/support/statalist/faq >>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>> >>>>> * >>>>> * For searches and help try: >>>>> * http://www.stata.com/help.cgi?search >>>>> * http://www.stata.com/support/statalist/faq >>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>> >>>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**References**:**st: Estimating the number of workers in each industry in each district***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

- Prev by Date:
**st: equality across quantile regressions WITHOUT sqreg** - Next by Date:
**AW: st: Writing a value from a variable into a macro** - Previous by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Next by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Index(es):