Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE |

Date |
Fri, 24 Sep 2010 17:27:02 -0400 |

Well, there will be numbers for up to 196,000 cells. many will be empty because of missing data; I would hesitate to call the remainder "estimates' unless the standard errors are reasonable and they were based on >10 -20 observations in the category. I have seen designs in which sum-of-weights estimates were worthless for estimating population totals, even with large sample sizes. PPS designs are less vulnerable to this kind of problem. Survey organizations generally have policies for suppressing estimates based on small sample sizes. Perhaps there is a standard practice in your field. I suggest that, in each district, you screen the industries present in the sample for a minimum number of individuals, say 10-20, and report proper survey estimates, with standard errors, and sample n's only for those. You can group smaller industries groups to meet these criteria.. The relative standard error (SE/estimate) x 100% is another criterion people use for suppressing estimates, and I've seen RSE's of 50% used as a maximum. Good luck! Steve Steven J. Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 ---------- Forwarded message ---------- From: Arka Roy Chaudhuri <gabuisi@gmail.com> Date: Fri, Sep 24, 2010 at 4:03 PM Subject: Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE To: statalist@hsphsun2.harvard.edu Dear Steve, Thanks a lot for all your advice.The problem is that in my dataset I have about 490 industries and 400 districts. Both industries and districts come with a code identifying them.I used the following command to estimate the number of workers in each industry in a district: bysort districtid industryid:egen workers=total(weight) /*here weight represents the inverse of probability of the household being sampled*/ duplicates drop districtid industryid,force keep districtid industryid workers save"T:\arka\industry_district.dta", Is the above estimation strategy leaving aside the issue of -svyset- my data? Please advice. Arka On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: > My advice about handling household counts of workers was wrong. Do not expand. > > Say you have counts for the number of workers in the hh in three industries > > n_agriculture > n_service > n_sales > > Then you would use do a separate command for each industry, for example: > ********************************************* > levelsof district, local(districts) > foreach x of local districts{ > svy: total n_agriculture if district==`x' > } > *********************************************** > You would use this form rather than an -over()- or -subpop()- option, > because districts are sampling strata. > > -Steve > > On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Arka- >> >> Based on your description, you would -svyset- your data as follows: >> >> Define a variable (call it "psu" for "primary sampling unit") which is >> the village number (rural sector) or urban block( urban sector) >> >> >> then >> ******************************************************** >> svyset psu [pw = your weight], strata(district) >> *********************************************************** >> >> If your data has one line per person, with "industry" categorized >> >> then the command for totals might be >> >> ***************************************************** >> svy: tab district industry, count se format(%10.0fc) >> ***************************************************** >> >> If your data has only counts of workers in each industry in each HH, >> then you should -expand- the data first so that it has one line for >> each worker in the HH, e.g. >> >> ************* >> expand hhsize >> ************* >> >> (but that might include children, so you will have to take some care) >> >> Now a word of advice. It is easy to go wrong in a survey analysis. As >> you are a student, I suggest that you seek guidance from a faculty >> member who is experienced in surveys, if not in Stata. (I know that >> the Department of Statistics at UBC has a survey sampling course). I >> also suggest that you obtain a text to learn about sampnling, such as >> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also >> recommend "Applied Survey Data Analysis" by Heeringa, West,and >> Berglund (2010); it uses Stata almost exclusively for its examples. >> >> Best wishes, >> >> Steve >> >> Steven J. Samuels >> sjsamuels@gmail.com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> >> >> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>> Hi, >>> Thanks for the help. In my dataset all the districts in the target >>> population are include. The sampling design is stratified multi-stage >>> design with the first stage units being villages in the rural sector >>> and urban blocks in the urban sector. The ultimate stage units (USU) >>> are households in both the sectors. >>> >>> I only have one set of weights that comes with the data. The >>> documentation states that the weights represent the probability that >>> the particular household was included in the sample. Please let me >>> know if I should include any other information. I am really thankful >>> for all the help. >>> >>> >>> >>> Regards, >>> >>> Arka >>> >>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> >>>> Arka- >>>> >>>> I can't answer without more information about the sampling design. >>>> Please describe the design in detail, including answers to the >>>> following questin.. >>>> >>>> 1. Were all districts in the target population included in the sample? >>>> Or, were districts sampled? >>>> >>>> 2. Are the final sampling weights the probability sampling weights? Or >>>> was there adjustment to the probabilithy weights (post-stratification, >>>> "raking") so that the sample results will better reflect population >>>> census proportions? If the weights are so adjusted, are the original >>>> sampling weights available to you? >>>> >>>> >>>> Steve >>>> >>>> Steven J. Samuels >>>> sjsamuels@gmail.com >>>> 18 Cantine's Island >>>> Saugerties NY 12477 >>>> USA >>>> Voice: 845-246-0774 >>>> Fax: 206-202-4783 >>>> >>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>> > Arka wrote: >>>> > "Now I want to estimate the number of workers >>>> > belonging to each industry in a particular district" >>>> > >>>> > A quite trivial example about Arka's issue may be the following one (set >>>> > aside survey technicalities): >>>> > >>>> > ---------------------code begins------------------------------------ >>>> > drop _all >>>> > set obs 100 >>>> > g Workers=_n >>>> > g District="East" in 1/50 >>>> > replace District="West" in 51/100 >>>> > g Industry="Concrete" in 1/30 >>>> > replace Industry="Steel" in 31/100 >>>> > g A= 1 if District=="East" & Industry=="Steel" >>>> > g B= 1 if District=="West" & Industry=="Steel" >>>> > g C= 1 if District=="East" & Industry=="Concrete" >>>> > ---------------------code ends------------------------------------ >>>> > >>>> > HTH and Kind Regards, >>>> > Carlo >>>> > -----Messaggio originale----- >>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>> > Chaudhuri >>>> > Inviato: mercoledì 15 settembre 2010 9.24 >>>> > A: statalist@hsphsun2.harvard.edu >>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>> > district >>>> > >>>> > Dear All, >>>> > I have a data set which has information at the individual >>>> > level.I have variables which record the district of residence of the >>>> > individual, the industry of employment of the individual and other >>>> > demographic characterstics.The data set also comes with weights which >>>> > represents the probability that a particular household is included in >>>> > the sample.Thus all individuals belonging to a particular household >>>> > get the same weight.Now I want to estimate the number of workers >>>> > belonging to each industry in a particular district.Could anyone >>>> > please advice on the correct stata code that I should write to get my >>>> > desired estimates?Also I would be grateful if somebody could advice me >>>> > on the possible biases that might affect my estimates at the >>>> > industry-district level.I would really appreciate any help in this >>>> > regard.Thanks >>>> > >>>> > Regards, >>>> > Arka >>>> > -- >>>> > Arka Roy Chaudhuri >>>> > PhD Student >>>> > University of British Columbia >>>> > 997-1873 East Mall >>>> > Vancouver >>>> > Canada >>>> > Ph: +1 (604) 349-8283 >>>> > Email: gabuisi@gmail.com >>>> > >>>> > * >>>> > * For searches and help try: >>>> > * http://www.stata.com/help.cgi?search >>>> > * http://www.stata.com/support/statalist/faq >>>> > * http://www.ats.ucla.edu/stat/stata/ >>>> > >>>> > >>>> > * >>>> > * For searches and help try: >>>> > * http://www.stata.com/help.cgi?search >>>> > * http://www.stata.com/support/statalist/faq >>>> > * http://www.ats.ucla.edu/stat/stata/ >>>> > >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**References**:**st: Estimating the number of workers in each industry in each district***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

- Prev by Date:
**RE: st: RE: Data Filtering Question ...** - Next by Date:
**st: RE: RE: estimation with a time trend.** - Previous by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Next by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Index(es):