Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date	Fri, 24 Sep 2010 17:27:02 -0400

Well, there will be numbers for up to 196,000 cells.  many will be
empty because of missing data; I would hesitate to call the remainder
"estimates'  unless the standard errors are reasonable and they were
based on >10 -20 observations in the category.

I have seen designs in which sum-of-weights estimates were worthless
for estimating population totals, even with large sample sizes.  PPS
designs are less vulnerable to this kind of problem.

Survey organizations generally have policies for suppressing
estimates based on small sample sizes. Perhaps there is a standard
practice in your field. I suggest that, in each district, you screen
the industries present in the sample for a minimum number of
individuals, say 10-20, and report proper survey estimates, with
standard errors, and sample n's only for those. You can group smaller
industries  groups to meet these criteria.. The relative standard
error (SE/estimate) x 100%  is another criterion people use for
suppressing estimates, and I've seen  RSE's of 50% used as a maximum.

Good luck!

Steve

Steven J. Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783














---------- Forwarded message ----------
From: Arka Roy Chaudhuri <[email protected]>
Date: Fri, Sep 24, 2010 at 4:03 PM
Subject: Re: st: R: Estimating the number of workers in each industry
in each district - flag: Stata 9/2 SE
To: [email protected]


Dear Steve,

  Thanks a lot for all your advice.The problem is that in my dataset
I have about 490 industries and 400 districts. Both industries and
districts come with a code identifying them.I used the following
command to estimate the number of workers in each industry in a
district:

bysort districtid industryid:egen workers=total(weight) /*here weight
represents the inverse of probability of the household being sampled*/
duplicates drop districtid industryid,force
keep  districtid industryid workers
save"T:\arka\industry_district.dta",


Is the above estimation strategy leaving aside the issue of -svyset-
my data? Please advice.

Arka

On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <[email protected]> wrote:
> My advice about handling household counts of workers was wrong. Do not expand.
>
> Say you have counts for the number of workers in the hh  in three industries
>
> n_agriculture
> n_service
> n_sales
>
> Then you would use do a separate command for each industry, for example:
> *********************************************
> levelsof district, local(districts)
> foreach x of  local districts{
> svy: total n_agriculture if district==`x'
> }
> ***********************************************
> You would use this form rather than an -over()-  or -subpop()- option,
> because districts are sampling strata.
>
> -Steve
>
> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <[email protected]> wrote:
>> Arka-
>>
>> Based on your description, you would -svyset- your data as follows:
>>
>> Define a variable (call it "psu" for "primary sampling unit") which is
>> the village number (rural sector) or urban block( urban sector)
>>
>>
>> then
>> ********************************************************
>> svyset psu [pw = your weight], strata(district)
>> ***********************************************************
>>
>> If your data has one line per person, with "industry" categorized
>>
>> then the command for totals might be
>>
>> *****************************************************
>> svy: tab district industry, count se format(%10.0fc)
>> *****************************************************
>>
>> If your data has only counts of workers in each industry in each HH,
>> then you should -expand- the data first so that it has one line for
>> each worker in the HH, e.g.
>>
>> *************
>> expand hhsize
>> *************
>>
>> (but that might include children, so you will have to take some care)
>>
>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>> you are a student, I suggest that you seek guidance from a faculty
>> member who is experienced in surveys, if not in Stata. (I know that
>> the Department of Statistics at UBC has a survey sampling course). I
>> also suggest that you obtain a text to learn about sampnling, such as
>> Sharon Lohr's "Sampling: Design and Analysis" (2009).  I also
>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>
>> Best wishes,
>>
>> Steve
>>
>> Steven J. Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>>
>>
>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>>> Hi,
>>>  Thanks for the help. In my dataset all the districts in the target
>>> population are include. The sampling design is stratified multi-stage
>>> design with the first stage units being villages in the rural sector
>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>> are households in both the sectors.
>>>
>>>   I only have one set of weights that comes with the data. The
>>> documentation states that the weights represent the probability that
>>> the particular household was included in the sample.  Please let me
>>> know if I should include any other information. I am really thankful
>>> for all the help.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Arka
>>>
>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <[email protected]> wrote:
>>>>
>>>> Arka-
>>>>
>>>> I can't answer  without more information about the sampling design.
>>>> Please describe the design in detail, including answers to the
>>>> following questin..
>>>>
>>>> 1. Were all districts in the target population included in the sample?
>>>> Or, were districts sampled?
>>>>
>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>> "raking")  so that the sample results will better reflect population
>>>> census proportions? If the weights are so adjusted,  are the original
>>>> sampling weights available to you?
>>>>
>>>>
>>>> Steve
>>>>
>>>> Steven J. Samuels
>>>> [email protected]
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:    206-202-4783
>>>>
>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <[email protected]> wrote:
>>>> > Arka wrote:
>>>> > "Now I want to estimate the number of workers
>>>> > belonging to each industry in a particular district"
>>>> >
>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>> > aside survey technicalities):
>>>> >
>>>> > ---------------------code begins------------------------------------
>>>> > drop _all
>>>> > set obs 100
>>>> > g Workers=_n
>>>> > g District="East" in 1/50
>>>> > replace District="West" in 51/100
>>>> > g Industry="Concrete" in 1/30
>>>> > replace  Industry="Steel" in 31/100
>>>> > g A= 1 if  District=="East" &  Industry=="Steel"
>>>> > g B= 1 if  District=="West" &  Industry=="Steel"
>>>> > g C= 1 if  District=="East" &  Industry=="Concrete"
>>>> > ---------------------code ends------------------------------------
>>>> >
>>>> > HTH and Kind Regards,
>>>> > Carlo
>>>> > -----Messaggio originale-----
>>>> > Da: [email protected]
>>>> > [mailto:[email protected]] Per conto di Arka Roy
>>>> > Chaudhuri
>>>> > Inviato: mercoledì 15 settembre 2010 9.24
>>>> > A: [email protected]
>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>> > district
>>>> >
>>>> > Dear All,
>>>> >        I have a data set which has information at the individual
>>>> > level.I have variables which record the district of residence of the
>>>> > individual, the industry of employment of the individual and other
>>>> > demographic characterstics.The data set also comes with weights which
>>>> > represents the probability that a particular household is included in
>>>> > the sample.Thus all individuals belonging to a particular household
>>>> > get the same weight.Now I want to estimate the number of workers
>>>> > belonging to each industry in a particular district.Could anyone
>>>> > please advice on the correct stata code that I should write to get my
>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>> > on the possible biases that might affect my estimates at the
>>>> > industry-district level.I would really appreciate any help in this
>>>> > regard.Thanks
>>>> >
>>>> > Regards,
>>>> > Arka
>>>> > --
>>>> > Arka Roy Chaudhuri
>>>> > PhD Student
>>>> > University of British Columbia
>>>> > 997-1873 East Mall
>>>> > Vancouver
>>>> > Canada
>>>> > Ph: +1 (604) 349-8283
>>>> > Email: [email protected]
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *   http://www.stata.com/help.cgi?search
>>>> > *   http://www.stata.com/support/statalist/faq
>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>> >
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *   http://www.stata.com/help.cgi?search
>>>> > *   http://www.stata.com/support/statalist/faq
>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>> >
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>

References:
- st: Estimating the number of workers in each industry in each district
  - From: Arka Roy Chaudhuri <[email protected]>
- st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: "Carlo Lazzaro" <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>

Prev by Date: RE: st: RE: Data Filtering Question ...
Next by Date: st: RE: RE: estimation with a time trend.
Previous by thread: Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Next by thread: Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Index(es):
- Date
- Thread