Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE


From   Arka Roy Chaudhuri <gabuisi@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date   Fri, 24 Sep 2010 13:03:04 -0700

Dear Steve,

   Thanks a lot for all your advice.The problem is that in my dataset
I have about 490 industries and 400 districts. Both industries and
districts come with a code identifying them.I used the following
command to estimate the number of workers in each industry in a
district:

bysort districtid industryid:egen workers=total(weight) /*here weight
represents the inverse of probability of the household being sampled*/
duplicates drop districtid industryid,force
keep  districtid industryid workers
save"T:\arka\industry_district.dta",


Is the above estimation strategy leaving aside the issue of -svyset-
my data? Please advice.

Arka

On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
> My advice about handling household counts of workers was wrong. Do not expand.
>
> Say you have counts for the number of workers in the hh  in three industries
>
> n_agriculture
> n_service
> n_sales
>
> Then you would use do a separate command for each industry, for example:
> *********************************************
> levelsof district, local(districts)
> foreach x of  local districts{
> svy: total n_agriculture if district==`x'
> }
> ***********************************************
> You would use this form rather than an -over()-  or -subpop()- option,
> because districts are sampling strata.
>
> -Steve
>
> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>> Arka-
>>
>> Based on your description, you would -svyset- your data as follows:
>>
>> Define a variable (call it "psu" for "primary sampling unit") which is
>> the village number (rural sector) or urban block( urban sector)
>>
>>
>> then
>> ********************************************************
>> svyset psu [pw = your weight], strata(district)
>> ***********************************************************
>>
>> If your data has one line per person, with "industry" categorized
>>
>> then the command for totals might be
>>
>> *****************************************************
>> svy: tab district industry, count se format(%10.0fc)
>> *****************************************************
>>
>> If your data has only counts of workers in each industry in each HH,
>> then you should -expand- the data first so that it has one line for
>> each worker in the HH, e.g.
>>
>> *************
>> expand hhsize
>> *************
>>
>> (but that might include children, so you will have to take some care)
>>
>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>> you are a student, I suggest that you seek guidance from a faculty
>> member who is experienced in surveys, if not in Stata. (I know that
>> the Department of Statistics at UBC has a survey sampling course). I
>> also suggest that you obtain a text to learn about sampnling, such as
>> Sharon Lohr's "Sampling: Design and Analysis" (2009).  I also
>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>
>> Best wishes,
>>
>> Steve
>>
>> Steven J. Samuels
>> sjsamuels@gmail.com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>>
>>
>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote:
>>> Hi,
>>>  Thanks for the help. In my dataset all the districts in the target
>>> population are include. The sampling design is stratified multi-stage
>>> design with the first stage units being villages in the rural sector
>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>> are households in both the sectors.
>>>
>>>   I only have one set of weights that comes with the data. The
>>> documentation states that the weights represent the probability that
>>> the particular household was included in the sample.  Please let me
>>> know if I should include any other information. I am really thankful
>>> for all the help.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Arka
>>>
>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>>>
>>>> Arka-
>>>>
>>>> I can't answer  without more information about the sampling design.
>>>> Please describe the design in detail, including answers to the
>>>> following questin..
>>>>
>>>> 1. Were all districts in the target population included in the sample?
>>>> Or, were districts sampled?
>>>>
>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>> "raking")  so that the sample results will better reflect population
>>>> census proportions? If the weights are so adjusted,  are the original
>>>> sampling weights available to you?
>>>>
>>>>
>>>> Steve
>>>>
>>>> Steven J. Samuels
>>>> sjsamuels@gmail.com
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:    206-202-4783
>>>>
>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote:
>>>> > Arka wrote:
>>>> > "Now I want to estimate the number of workers
>>>> > belonging to each industry in a particular district"
>>>> >
>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>> > aside survey technicalities):
>>>> >
>>>> > ---------------------code begins------------------------------------
>>>> > drop _all
>>>> > set obs 100
>>>> > g Workers=_n
>>>> > g District="East" in 1/50
>>>> > replace District="West" in 51/100
>>>> > g Industry="Concrete" in 1/30
>>>> > replace  Industry="Steel" in 31/100
>>>> > g A= 1 if  District=="East" &  Industry=="Steel"
>>>> > g B= 1 if  District=="West" &  Industry=="Steel"
>>>> > g C= 1 if  District=="East" &  Industry=="Concrete"
>>>> > ---------------------code ends------------------------------------
>>>> >
>>>> > HTH and Kind Regards,
>>>> > Carlo
>>>> > -----Messaggio originale-----
>>>> > Da: owner-statalist@hsphsun2.harvard.edu
>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy
>>>> > Chaudhuri
>>>> > Inviato: mercoledì 15 settembre 2010 9.24
>>>> > A: statalist@hsphsun2.harvard.edu
>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>> > district
>>>> >
>>>> > Dear All,
>>>> >        I have a data set which has information at the individual
>>>> > level.I have variables which record the district of residence of the
>>>> > individual, the industry of employment of the individual and other
>>>> > demographic characterstics.The data set also comes with weights which
>>>> > represents the probability that a particular household is included in
>>>> > the sample.Thus all individuals belonging to a particular household
>>>> > get the same weight.Now I want to estimate the number of workers
>>>> > belonging to each industry in a particular district.Could anyone
>>>> > please advice on the correct stata code that I should write to get my
>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>> > on the possible biases that might affect my estimates at the
>>>> > industry-district level.I would really appreciate any help in this
>>>> > regard.Thanks
>>>> >
>>>> > Regards,
>>>> > Arka
>>>> > --
>>>> > Arka Roy Chaudhuri
>>>> > PhD Student
>>>> > University of British Columbia
>>>> > 997-1873 East Mall
>>>> > Vancouver
>>>> > Canada
>>>> > Ph: +1 (604) 349-8283
>>>> > Email: gabuisi@gmail.com
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *   http://www.stata.com/help.cgi?search
>>>> > *   http://www.stata.com/support/statalist/faq
>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>> >
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *   http://www.stata.com/help.cgi?search
>>>> > *   http://www.stata.com/support/statalist/faq
>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>> >
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index