Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE

From	Arka Roy Chaudhuri <[email protected]>
To	[email protected]
Subject	Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date	Sun, 26 Sep 2010 23:14:38 -0700

Dear Steve,

     Thanks for all your suggestions. I have already ensured that I
have adequate number of observations in each district-industry cell. I
will also look at the relative standard error criterion.Once again
thanks a lot for your help.

Regards,
Arka




 Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <[email protected]> wrote:
> Well, there will be numbers for up to 196,000 cells.  many will be
> empty because of missing data; I would hesitate to call the remainder
> "estimates'  unless the standard errors are reasonable and they were
> based on >10 -20 observations in the category.
>
> I have seen designs in which sum-of-weights estimates were worthless
> for estimating population totals, even with large sample sizes.  PPS
> designs are less vulnerable to this kind of problem.
>
> Survey organizations generally have policies for suppressing
> estimates based on small sample sizes. Perhaps there is a standard
> practice in your field. I suggest that, in each district, you screen
> the industries present in the sample for a minimum number of
> individuals, say 10-20, and report proper survey estimates, with
> standard errors, and sample n's only for those. You can group smaller
> industries  groups to meet these criteria.. The relative standard
> error (SE/estimate) x 100%  is another criterion people use for
> suppressing estimates, and I've seen  RSE's of 50% used as a maximum.
>
> Good luck!
>
> Steve
>
> Steven J. Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: Arka Roy Chaudhuri <[email protected]>
> Date: Fri, Sep 24, 2010 at 4:03 PM
> Subject: Re: st: R: Estimating the number of workers in each industry
> in each district - flag: Stata 9/2 SE
> To: [email protected]
>
>
> Dear Steve,
>
>   Thanks a lot for all your advice.The problem is that in my dataset
> I have about 490 industries and 400 districts. Both industries and
> districts come with a code identifying them.I used the following
> command to estimate the number of workers in each industry in a
> district:
>
> bysort districtid industryid:egen workers=total(weight) /*here weight
> represents the inverse of probability of the household being sampled*/
> duplicates drop districtid industryid,force
> keep  districtid industryid workers
> save"T:\arka\industry_district.dta",
>
>
> Is the above estimation strategy leaving aside the issue of -svyset-
> my data? Please advice.
>
> Arka
>
> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <[email protected]> wrote:
>> My advice about handling household counts of workers was wrong. Do not expand.
>>
>> Say you have counts for the number of workers in the hh  in three industries
>>
>> n_agriculture
>> n_service
>> n_sales
>>
>> Then you would use do a separate command for each industry, for example:
>> *********************************************
>> levelsof district, local(districts)
>> foreach x of  local districts{
>> svy: total n_agriculture if district==`x'
>> }
>> ***********************************************
>> You would use this form rather than an -over()-  or -subpop()- option,
>> because districts are sampling strata.
>>
>> -Steve
>>
>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <[email protected]> wrote:
>>> Arka-
>>>
>>> Based on your description, you would -svyset- your data as follows:
>>>
>>> Define a variable (call it "psu" for "primary sampling unit") which is
>>> the village number (rural sector) or urban block( urban sector)
>>>
>>>
>>> then
>>> ********************************************************
>>> svyset psu [pw = your weight], strata(district)
>>> ***********************************************************
>>>
>>> If your data has one line per person, with "industry" categorized
>>>
>>> then the command for totals might be
>>>
>>> *****************************************************
>>> svy: tab district industry, count se format(%10.0fc)
>>> *****************************************************
>>>
>>> If your data has only counts of workers in each industry in each HH,
>>> then you should -expand- the data first so that it has one line for
>>> each worker in the HH, e.g.
>>>
>>> *************
>>> expand hhsize
>>> *************
>>>
>>> (but that might include children, so you will have to take some care)
>>>
>>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>>> you are a student, I suggest that you seek guidance from a faculty
>>> member who is experienced in surveys, if not in Stata. (I know that
>>> the Department of Statistics at UBC has a survey sampling course). I
>>> also suggest that you obtain a text to learn about sampnling, such as
>>> Sharon Lohr's "Sampling: Design and Analysis" (2009).  I also
>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>>
>>> Best wishes,
>>>
>>> Steve
>>>
>>> Steven J. Samuels
>>> [email protected]
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>>
>>>
>>>
>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>>>> Hi,
>>>>  Thanks for the help. In my dataset all the districts in the target
>>>> population are include. The sampling design is stratified multi-stage
>>>> design with the first stage units being villages in the rural sector
>>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>>> are households in both the sectors.
>>>>
>>>>   I only have one set of weights that comes with the data. The
>>>> documentation states that the weights represent the probability that
>>>> the particular household was included in the sample.  Please let me
>>>> know if I should include any other information. I am really thankful
>>>> for all the help.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Arka
>>>>
>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <[email protected]> wrote:
>>>>>
>>>>> Arka-
>>>>>
>>>>> I can't answer  without more information about the sampling design.
>>>>> Please describe the design in detail, including answers to the
>>>>> following questin..
>>>>>
>>>>> 1. Were all districts in the target population included in the sample?
>>>>> Or, were districts sampled?
>>>>>
>>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>>> "raking")  so that the sample results will better reflect population
>>>>> census proportions? If the weights are so adjusted,  are the original
>>>>> sampling weights available to you?
>>>>>
>>>>>
>>>>> Steve
>>>>>
>>>>> Steven J. Samuels
>>>>> [email protected]
>>>>> 18 Cantine's Island
>>>>> Saugerties NY 12477
>>>>> USA
>>>>> Voice: 845-246-0774
>>>>> Fax:    206-202-4783
>>>>>
>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <[email protected]> wrote:
>>>>> > Arka wrote:
>>>>> > "Now I want to estimate the number of workers
>>>>> > belonging to each industry in a particular district"
>>>>> >
>>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>>> > aside survey technicalities):
>>>>> >
>>>>> > ---------------------code begins------------------------------------
>>>>> > drop _all
>>>>> > set obs 100
>>>>> > g Workers=_n
>>>>> > g District="East" in 1/50
>>>>> > replace District="West" in 51/100
>>>>> > g Industry="Concrete" in 1/30
>>>>> > replace  Industry="Steel" in 31/100
>>>>> > g A= 1 if  District=="East" &  Industry=="Steel"
>>>>> > g B= 1 if  District=="West" &  Industry=="Steel"
>>>>> > g C= 1 if  District=="East" &  Industry=="Concrete"
>>>>> > ---------------------code ends------------------------------------
>>>>> >
>>>>> > HTH and Kind Regards,
>>>>> > Carlo
>>>>> > -----Messaggio originale-----
>>>>> > Da: [email protected]
>>>>> > [mailto:[email protected]] Per conto di Arka Roy
>>>>> > Chaudhuri
>>>>> > Inviato: mercoledì 15 settembre 2010 9.24
>>>>> > A: [email protected]
>>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>>> > district
>>>>> >
>>>>> > Dear All,
>>>>> >        I have a data set which has information at the individual
>>>>> > level.I have variables which record the district of residence of the
>>>>> > individual, the industry of employment of the individual and other
>>>>> > demographic characterstics.The data set also comes with weights which
>>>>> > represents the probability that a particular household is included in
>>>>> > the sample.Thus all individuals belonging to a particular household
>>>>> > get the same weight.Now I want to estimate the number of workers
>>>>> > belonging to each industry in a particular district.Could anyone
>>>>> > please advice on the correct stata code that I should write to get my
>>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>>> > on the possible biases that might affect my estimates at the
>>>>> > industry-district level.I would really appreciate any help in this
>>>>> > regard.Thanks
>>>>> >
>>>>> > Regards,
>>>>> > Arka
>>>>> > --
>>>>> > Arka Roy Chaudhuri
>>>>> > PhD Student
>>>>> > University of British Columbia
>>>>> > 997-1873 East Mall
>>>>> > Vancouver
>>>>> > Canada
>>>>> > Ph: +1 (604) 349-8283
>>>>> > Email: [email protected]
>>>>> >
>>>>> > *
>>>>> > *   For searches and help try:
>>>>> > *   http://www.stata.com/help.cgi?search
>>>>> > *   http://www.stata.com/support/statalist/faq
>>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>>> >
>>>>> >
>>>>> > *
>>>>> > *   For searches and help try:
>>>>> > *   http://www.stata.com/help.cgi?search
>>>>> > *   http://www.stata.com/support/statalist/faq
>>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>>> >
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>

References:
- st: Estimating the number of workers in each industry in each district
  - From: Arka Roy Chaudhuri <[email protected]>
- st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: "Carlo Lazzaro" <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>

Prev by Date: st: error while doing IV
Next by Date: Re: st: compare effect size between dummys and metrics variables in logistic regression
Previous by thread: Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Next by thread: Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Index(es):
- Date
- Thread