Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Arka Roy Chaudhuri <gabuisi@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE |

Date |
Sun, 26 Sep 2010 23:14:38 -0700 |

Dear Steve, Thanks for all your suggestions. I have already ensured that I have adequate number of observations in each district-industry cell. I will also look at the relative standard error criterion.Once again thanks a lot for your help. Regards, Arka Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > Well, there will be numbers for up to 196,000 cells. many will be > empty because of missing data; I would hesitate to call the remainder > "estimates' unless the standard errors are reasonable and they were > based on >10 -20 observations in the category. > > I have seen designs in which sum-of-weights estimates were worthless > for estimating population totals, even with large sample sizes. PPS > designs are less vulnerable to this kind of problem. > > Survey organizations generally have policies for suppressing > estimates based on small sample sizes. Perhaps there is a standard > practice in your field. I suggest that, in each district, you screen > the industries present in the sample for a minimum number of > individuals, say 10-20, and report proper survey estimates, with > standard errors, and sample n's only for those. You can group smaller > industries groups to meet these criteria.. The relative standard > error (SE/estimate) x 100% is another criterion people use for > suppressing estimates, and I've seen RSE's of 50% used as a maximum. > > Good luck! > > Steve > > Steven J. Samuels > sjsamuels@gmail.com > 18 Cantine's Island > Saugerties NY 12477 > USA > Voice: 845-246-0774 > Fax: 206-202-4783 > > > > > > > > > > > > > > > ---------- Forwarded message ---------- > From: Arka Roy Chaudhuri <gabuisi@gmail.com> > Date: Fri, Sep 24, 2010 at 4:03 PM > Subject: Re: st: R: Estimating the number of workers in each industry > in each district - flag: Stata 9/2 SE > To: statalist@hsphsun2.harvard.edu > > > Dear Steve, > > Thanks a lot for all your advice.The problem is that in my dataset > I have about 490 industries and 400 districts. Both industries and > districts come with a code identifying them.I used the following > command to estimate the number of workers in each industry in a > district: > > bysort districtid industryid:egen workers=total(weight) /*here weight > represents the inverse of probability of the household being sampled*/ > duplicates drop districtid industryid,force > keep districtid industryid workers > save"T:\arka\industry_district.dta", > > > Is the above estimation strategy leaving aside the issue of -svyset- > my data? Please advice. > > Arka > > On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >> My advice about handling household counts of workers was wrong. Do not expand. >> >> Say you have counts for the number of workers in the hh in three industries >> >> n_agriculture >> n_service >> n_sales >> >> Then you would use do a separate command for each industry, for example: >> ********************************************* >> levelsof district, local(districts) >> foreach x of local districts{ >> svy: total n_agriculture if district==`x' >> } >> *********************************************** >> You would use this form rather than an -over()- or -subpop()- option, >> because districts are sampling strata. >> >> -Steve >> >> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> Arka- >>> >>> Based on your description, you would -svyset- your data as follows: >>> >>> Define a variable (call it "psu" for "primary sampling unit") which is >>> the village number (rural sector) or urban block( urban sector) >>> >>> >>> then >>> ******************************************************** >>> svyset psu [pw = your weight], strata(district) >>> *********************************************************** >>> >>> If your data has one line per person, with "industry" categorized >>> >>> then the command for totals might be >>> >>> ***************************************************** >>> svy: tab district industry, count se format(%10.0fc) >>> ***************************************************** >>> >>> If your data has only counts of workers in each industry in each HH, >>> then you should -expand- the data first so that it has one line for >>> each worker in the HH, e.g. >>> >>> ************* >>> expand hhsize >>> ************* >>> >>> (but that might include children, so you will have to take some care) >>> >>> Now a word of advice. It is easy to go wrong in a survey analysis. As >>> you are a student, I suggest that you seek guidance from a faculty >>> member who is experienced in surveys, if not in Stata. (I know that >>> the Department of Statistics at UBC has a survey sampling course). I >>> also suggest that you obtain a text to learn about sampnling, such as >>> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also >>> recommend "Applied Survey Data Analysis" by Heeringa, West,and >>> Berglund (2010); it uses Stata almost exclusively for its examples. >>> >>> Best wishes, >>> >>> Steve >>> >>> Steven J. Samuels >>> sjsamuels@gmail.com >>> 18 Cantine's Island >>> Saugerties NY 12477 >>> USA >>> Voice: 845-246-0774 >>> Fax: 206-202-4783 >>> >>> >>> >>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>> Hi, >>>> Thanks for the help. In my dataset all the districts in the target >>>> population are include. The sampling design is stratified multi-stage >>>> design with the first stage units being villages in the rural sector >>>> and urban blocks in the urban sector. The ultimate stage units (USU) >>>> are households in both the sectors. >>>> >>>> I only have one set of weights that comes with the data. The >>>> documentation states that the weights represent the probability that >>>> the particular household was included in the sample. Please let me >>>> know if I should include any other information. I am really thankful >>>> for all the help. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Arka >>>> >>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>> >>>>> Arka- >>>>> >>>>> I can't answer without more information about the sampling design. >>>>> Please describe the design in detail, including answers to the >>>>> following questin.. >>>>> >>>>> 1. Were all districts in the target population included in the sample? >>>>> Or, were districts sampled? >>>>> >>>>> 2. Are the final sampling weights the probability sampling weights? Or >>>>> was there adjustment to the probabilithy weights (post-stratification, >>>>> "raking") so that the sample results will better reflect population >>>>> census proportions? If the weights are so adjusted, are the original >>>>> sampling weights available to you? >>>>> >>>>> >>>>> Steve >>>>> >>>>> Steven J. Samuels >>>>> sjsamuels@gmail.com >>>>> 18 Cantine's Island >>>>> Saugerties NY 12477 >>>>> USA >>>>> Voice: 845-246-0774 >>>>> Fax: 206-202-4783 >>>>> >>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>>> > Arka wrote: >>>>> > "Now I want to estimate the number of workers >>>>> > belonging to each industry in a particular district" >>>>> > >>>>> > A quite trivial example about Arka's issue may be the following one (set >>>>> > aside survey technicalities): >>>>> > >>>>> > ---------------------code begins------------------------------------ >>>>> > drop _all >>>>> > set obs 100 >>>>> > g Workers=_n >>>>> > g District="East" in 1/50 >>>>> > replace District="West" in 51/100 >>>>> > g Industry="Concrete" in 1/30 >>>>> > replace Industry="Steel" in 31/100 >>>>> > g A= 1 if District=="East" & Industry=="Steel" >>>>> > g B= 1 if District=="West" & Industry=="Steel" >>>>> > g C= 1 if District=="East" & Industry=="Concrete" >>>>> > ---------------------code ends------------------------------------ >>>>> > >>>>> > HTH and Kind Regards, >>>>> > Carlo >>>>> > -----Messaggio originale----- >>>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>>> > Chaudhuri >>>>> > Inviato: mercoledì 15 settembre 2010 9.24 >>>>> > A: statalist@hsphsun2.harvard.edu >>>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>>> > district >>>>> > >>>>> > Dear All, >>>>> > I have a data set which has information at the individual >>>>> > level.I have variables which record the district of residence of the >>>>> > individual, the industry of employment of the individual and other >>>>> > demographic characterstics.The data set also comes with weights which >>>>> > represents the probability that a particular household is included in >>>>> > the sample.Thus all individuals belonging to a particular household >>>>> > get the same weight.Now I want to estimate the number of workers >>>>> > belonging to each industry in a particular district.Could anyone >>>>> > please advice on the correct stata code that I should write to get my >>>>> > desired estimates?Also I would be grateful if somebody could advice me >>>>> > on the possible biases that might affect my estimates at the >>>>> > industry-district level.I would really appreciate any help in this >>>>> > regard.Thanks >>>>> > >>>>> > Regards, >>>>> > Arka >>>>> > -- >>>>> > Arka Roy Chaudhuri >>>>> > PhD Student >>>>> > University of British Columbia >>>>> > 997-1873 East Mall >>>>> > Vancouver >>>>> > Canada >>>>> > Ph: +1 (604) 349-8283 >>>>> > Email: gabuisi@gmail.com >>>>> > >>>>> > * >>>>> > * For searches and help try: >>>>> > * http://www.stata.com/help.cgi?search >>>>> > * http://www.stata.com/support/statalist/faq >>>>> > * http://www.ats.ucla.edu/stat/stata/ >>>>> > >>>>> > >>>>> > * >>>>> > * For searches and help try: >>>>> > * http://www.stata.com/help.cgi?search >>>>> > * http://www.stata.com/support/statalist/faq >>>>> > * http://www.ats.ucla.edu/stat/stata/ >>>>> > >>>>> >>>>> * >>>>> * For searches and help try: >>>>> * http://www.stata.com/help.cgi?search >>>>> * http://www.stata.com/support/statalist/faq >>>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**References**:**st: Estimating the number of workers in each industry in each district***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**st: error while doing IV** - Next by Date:
**Re: st: compare effect size between dummys and metrics variables in logistic regression** - Previous by thread:
**Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Next by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Index(es):