Dear Steve, Thanks a lot for all your advice.The problem is that in my dataset I have about 490 industries and 400 districts. Both industries and districts come with a code identifying them.I used the following command to estimate the number of workers in each industry in a district: bysort districtid industryid:egen workers=total(weight) /*here weight represents the inverse of probability of the household being sampled*/ duplicates drop districtid industryid,force keep districtid industryid workers save"T:\arka\industry_district.dta", Is the above estimation strategy leaving aside the issue of -svyset- my data? Please advice. Arka On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: > My advice about handling household counts of workers was wrong. Do not expand. > > Say you have counts for the number of workers in the hh in three industries > > n_agriculture > n_service > n_sales > > Then you would use do a separate command for each industry, for example: > ********************************************* > levelsof district, local(districts) > foreach x of local districts{ > svy: total n_agriculture if district==`x' > } > *********************************************** > You would use this form rather than an -over()- or -subpop()- option, > because districts are sampling strata. > > -Steve > > On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Arka- >> >> Based on your description, you would -svyset- your data as follows: >> >> Define a variable (call it "psu" for "primary sampling unit") which is >> the village number (rural sector) or urban block( urban sector) >> >> >> then >> ******************************************************** >> svyset psu [pw = your weight], strata(district) >> *********************************************************** >> >> If your data has one line per person, with "industry" categorized >> >> then the command for totals might be >> >> ***************************************************** >> svy: tab district industry, count se format(%10.0fc) >> ***************************************************** >> >> If your data has only counts of workers in each industry in each HH, >> then you should -expand- the data first so that it has one line for >> each worker in the HH, e.g. >> >> ************* >> expand hhsize >> ************* >> >> (but that might include children, so you will have to take some care) >> >> Now a word of advice. It is easy to go wrong in a survey analysis. As >> you are a student, I suggest that you seek guidance from a faculty >> member who is experienced in surveys, if not in Stata. (I know that >> the Department of Statistics at UBC has a survey sampling course). I >> also suggest that you obtain a text to learn about sampnling, such as >> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also >> recommend "Applied Survey Data Analysis" by Heeringa, West,and >> Berglund (2010); it uses Stata almost exclusively for its examples. >> >> Best wishes, >> >> Steve >> >> Steven J. Samuels >> sjsamuels@gmail.com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> >> >> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>> Hi, >>> Thanks for the help. In my dataset all the districts in the target >>> population are include. The sampling design is stratified multi-stage >>> design with the first stage units being villages in the rural sector >>> and urban blocks in the urban sector. The ultimate stage units (USU) >>> are households in both the sectors. >>> >>> I only have one set of weights that comes with the data. The >>> documentation states that the weights represent the probability that >>> the particular household was included in the sample. Please let me >>> know if I should include any other information. I am really thankful >>> for all the help. >>> >>> >>> >>> Regards, >>> >>> Arka >>> >>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> >>>> Arka- >>>> >>>> I can't answer without more information about the sampling design. >>>> Please describe the design in detail, including answers to the >>>> following questin.. >>>> >>>> 1. Were all districts in the target population included in the sample? >>>> Or, were districts sampled? >>>> >>>> 2. Are the final sampling weights the probability sampling weights? Or >>>> was there adjustment to the probabilithy weights (post-stratification, >>>> "raking") so that the sample results will better reflect population >>>> census proportions? If the weights are so adjusted, are the original >>>> sampling weights available to you? >>>> >>>> >>>> Steve >>>> >>>> Steven J. Samuels >>>> sjsamuels@gmail.com >>>> 18 Cantine's Island >>>> Saugerties NY 12477 >>>> USA >>>> Voice: 845-246-0774 >>>> Fax: 206-202-4783 >>>> >>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>> > Arka wrote: >>>> > "Now I want to estimate the number of workers >>>> > belonging to each industry in a particular district" >>>> > >>>> > A quite trivial example about Arka's issue may be the following one (set >>>> > aside survey technicalities): >>>> > >>>> > ---------------------code begins------------------------------------ >>>> > drop _all >>>> > set obs 100 >>>> > g Workers=_n >>>> > g District="East" in 1/50 >>>> > replace District="West" in 51/100 >>>> > g Industry="Concrete" in 1/30 >>>> > replace Industry="Steel" in 31/100 >>>> > g A= 1 if District=="East" & Industry=="Steel" >>>> > g B= 1 if District=="West" & Industry=="Steel" >>>> > g C= 1 if District=="East" & Industry=="Concrete" >>>> > ---------------------code ends------------------------------------ >>>> > >>>> > HTH and Kind Regards, >>>> > Carlo >>>> > -----Messaggio originale----- >>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>> > Chaudhuri >>>> > Inviato: mercoledì 15 settembre 2010 9.24 >>>> > A: statalist@hsphsun2.harvard.edu >>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>> > district >>>> > >>>> > Dear All, >>>> > I have a data set which has information at the individual >>>> > level.I have variables which record the district of residence of the >>>> > individual, the industry of employment of the individual and other >>>> > demographic characterstics.The data set also comes with weights which >>>> > represents the probability that a particular household is included in >>>> > the sample.Thus all individuals belonging to a particular household >>>> > get the same weight.Now I want to estimate the number of workers >>>> > belonging to each industry in a particular district.Could anyone >>>> > please advice on the correct stata code that I should write to get my >>>> > desired estimates?Also I would be grateful if somebody could advice me >>>> > on the possible biases that might affect my estimates at the >>>> > industry-district level.I would really appreciate any help in this >>>> > regard.Thanks >>>> > >>>> > Regards, >>>> > Arka >>>> > -- >>>> > Arka Roy Chaudhuri >>>> > PhD Student >>>> > University of British Columbia >>>> > 997-1873 East Mall >>>> > Vancouver >>>> > Canada >>>> > Ph: +1 (604) 349-8283 >>>> > Email: gabuisi@gmail.com >>>> > >>>> > * >>>> > * For searches and help try: >>>> > * http://www.stata.com/help.cgi?search >>>> > * http://www.stata.com/support/statalist/faq >>>> > * http://www.ats.ucla.edu/stat/stata/ >>>> > >>>> > >>>> > * >>>> > * For searches and help try: >>>> > * http://www.stata.com/help.cgi?search >>>> > * http://www.stata.com/support/statalist/faq >>>> > * http://www.ats.ucla.edu/stat/stata/ >>>> > >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

