Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Arka Roy Chaudhuri <gabuisi@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE |

Date |
Mon, 27 Sep 2010 16:14:13 -0700 |

Hi, My apologies for posting a different question on the same thread. Thanks for pointing out the mistake. Regards, Arka On Mon, Sep 27, 2010 at 2:01 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > Please resend with a new subject heading. I have no expertise in this > area and those who do will not necessarily see your post. > > On Mon, Sep 27, 2010 at 4:16 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >> Dear Steve, >> Now I am having some problems in estimating a IV regression.It would be >> great if you could please help me with my problem. >> >> I have the following variables in my data set:districtid, average >> residual gender wage gap in a district(avggap), scaled district >> tariff(district_tariff_scaled), unscaled district >> tariffs(district_tariff_unscaled), a set of district >> dummies(_Idistricti*), a time dummy since I have two time >> periods(time), district population(district_popn). I am interested in >> looking at the effect of scaled district tariffs on the average >> residual gender wage gap using the unscaled district tariffs as >> instruments for district tariffs. I run the following 3 regressions(I >> use the district population as weights and cluster over districts to >> correct for standard errors):- >> >> 1)regress avggap district_tariff_scaled time _Idistricti* >> [aweight=district_popn],cluster(districtid) >> In this regression I look at the structural equation i.e the effect of >> scaled district tariffs on average gender wage gap. I do not get any >> error in this case. >> >> 2)regress avggap district_tariff_unscaled time _Idistricti* >> [aweight=district_popn],cluster(districtid) >> In this regression I l look at the reduced form relationship between >> unscaled tariffs and the average gender wage gap. I do not get any >> error in this case. >> >> 3)ivregress 2sls avggap (district_tariff_scaled >> =district_tariff_unscaled) time _Idistricti* >> [aweight=district_popn],cluster(districtid) >> This is the equation that I have problem estimating.I use the unscaled >> tariffs as instruments for the scaled tariffs.However Stata gives me >> the following error: >> >> ivregress 2sls avggap (district_tariff_scaled >> =district_tariff_unscaled) time _Idistricti* >> [aweight=district_popn],cluster(districtid) >> (sum of wgt is 0.0000e+00) >> no observations >> r(2000); >> >> Surprisingly if I estimate the third equation without clustering over >> the districts Stata gives me results without any error.I tried using >> the vce option instead of the cluster option but I get the same error. >> I do not understand why clustering over districts does not create any >> problem in the estimation of the first two equations while it returns >> an error while I am estimating the 3rd equation. Since I am using a >> difference in difference approach it is essential that I cluster over >> district. I am using Stata11. >> >> I will be really grateful if you could help me out with this problem.Thanks >> >> Regards, >> Arka >> >> On Mon, Sep 27, 2010 at 6:18 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> You are welcome, Arka. áThe 50% RSE criterion I've seen is a worst >>> case; 30% would be more believable. >>> >>> Steve >>> >>> On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>> Dear Steve, >>>> >>>> á á Thanks for all your suggestions. I have already ensured that I >>>> have adequate number of observations in each district-industry cell. I >>>> will also look at the relative standard error criterion.Once again >>>> thanks a lot for your help. >>>> >>>> Regards, >>>> Arka >>>> >>>> >>>> >>>> >>>> áFri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>> Well, there will be numbers for up to 196,000 cells. ámany will be >>>>> empty because of missing data; I would hesitate to call the remainder >>>>> "estimates' áunless the standard errors are reasonable and they were >>>>> based on >10 -20 observations in the category. >>>>> >>>>> I have seen designs in which sum-of-weights estimates were worthless >>>>> for estimating population totals, even with large sample sizes. áPPS >>>>> designs are less vulnerable to this kind of problem. >>>>> >>>>> Survey organizations generally have policies for suppressing >>>>> estimates based on small sample sizes. Perhaps there is a standard >>>>> practice in your field. I suggest that, in each district, you screen >>>>> the industries present in the sample for a minimum number of >>>>> individuals, say 10-20, and report proper survey estimates, with >>>>> standard errors, and sample n's only for those. You can group smaller >>>>> industries ágroups to meet these criteria.. The relative standard >>>>> error (SE/estimate) x 100% áis another criterion people use for >>>>> suppressing estimates, and I've seen áRSE's of 50% used as a maximum. >>>>> >>>>> Good luck! >>>>> >>>>> Steve >>>>> >>>>> Steven J. Samuels >>>>> sjsamuels@gmail.com >>>>> 18 Cantine's Island >>>>> Saugerties NY 12477 >>>>> USA >>>>> Voice: 845-246-0774 >>>>> Fax:á á 206-202-4783 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Arka Roy Chaudhuri <gabuisi@gmail.com> >>>>> Date: Fri, Sep 24, 2010 at 4:03 PM >>>>> Subject: Re: st: R: Estimating the number of workers in each industry >>>>> in each district - flag: Stata 9/2 SE >>>>> To: statalist@hsphsun2.harvard.edu >>>>> >>>>> >>>>> Dear Steve, >>>>> >>>>> á Thanks a lot for all your advice.The problem is that in my dataset >>>>> I have about 490 industries and 400 districts. Both industries and >>>>> districts come with a code identifying them.I used the following >>>>> command to estimate the number of workers in each industry in a >>>>> district: >>>>> >>>>> bysort districtid industryid:egen workers=total(weight) /*here weight >>>>> represents the inverse of probability of the household being sampled*/ >>>>> duplicates drop districtid industryid,force >>>>> keep ádistrictid industryid workers >>>>> save"T:\arka\industry_district.dta", >>>>> >>>>> >>>>> Is the above estimation strategy leaving aside the issue of -svyset- >>>>> my data? Please advice. >>>>> >>>>> Arka >>>>> >>>>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>> My advice about handling household counts of workers was wrong. Do not expand. >>>>>> >>>>>> Say you have counts for the number of workers in the hh áin three industries >>>>>> >>>>>> n_agriculture >>>>>> n_service >>>>>> n_sales >>>>>> >>>>>> Then you would use do a separate command for each industry, for example: >>>>>> ********************************************* >>>>>> levelsof district, local(districts) >>>>>> foreach x of álocal districts{ >>>>>> svy: total n_agriculture if district==`x' >>>>>> } >>>>>> *********************************************** >>>>>> You would use this form rather than an -over()- áor -subpop()- option, >>>>>> because districts are sampling strata. >>>>>> >>>>>> -Steve >>>>>> >>>>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>>> Arka- >>>>>>> >>>>>>> Based on your description, you would -svyset- your data as follows: >>>>>>> >>>>>>> Define a variable (call it "psu" for "primary sampling unit") which is >>>>>>> the village number (rural sector) or urban block( urban sector) >>>>>>> >>>>>>> >>>>>>> then >>>>>>> ******************************************************** >>>>>>> svyset psu [pw = your weight], strata(district) >>>>>>> *********************************************************** >>>>>>> >>>>>>> If your data has one line per person, with "industry" categorized >>>>>>> >>>>>>> then the command for totals might be >>>>>>> >>>>>>> ***************************************************** >>>>>>> svy: tab district industry, count se format(%10.0fc) >>>>>>> ***************************************************** >>>>>>> >>>>>>> If your data has only counts of workers in each industry in each HH, >>>>>>> then you should -expand- the data first so that it has one line for >>>>>>> each worker in the HH, e.g. >>>>>>> >>>>>>> ************* >>>>>>> expand hhsize >>>>>>> ************* >>>>>>> >>>>>>> (but that might include children, so you will have to take some care) >>>>>>> >>>>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As >>>>>>> you are a student, I suggest that you seek guidance from a faculty >>>>>>> member who is experienced in surveys, if not in Stata. (I know that >>>>>>> the Department of Statistics at UBC has a survey sampling course). I >>>>>>> also suggest that you obtain a text to learn about sampnling, such as >>>>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). áI also >>>>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and >>>>>>> Berglund (2010); it uses Stata almost exclusively for its examples. >>>>>>> >>>>>>> Best wishes, >>>>>>> >>>>>>> Steve >>>>>>> >>>>>>> Steven J. Samuels >>>>>>> sjsamuels@gmail.com >>>>>>> 18 Cantine's Island >>>>>>> Saugerties NY 12477 >>>>>>> USA >>>>>>> Voice: 845-246-0774 >>>>>>> Fax:á á 206-202-4783 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>>>>>> Hi, >>>>>>>> áThanks for the help. In my dataset all the districts in the target >>>>>>>> population are include. The sampling design is stratified multi-stage >>>>>>>> design with the first stage units being villages in the rural sector >>>>>>>> and urban blocks in the urban sector. The ultimate stage units (USU) >>>>>>>> are households in both the sectors. >>>>>>>> >>>>>>>> á I only have one set of weights that comes with the data. The >>>>>>>> documentation states that the weights represent the probability that >>>>>>>> the particular household was included in the sample. áPlease let me >>>>>>>> know if I should include any other information. I am really thankful >>>>>>>> for all the help. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Arka >>>>>>>> >>>>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Arka- >>>>>>>>> >>>>>>>>> I can't answer áwithout more information about the sampling design. >>>>>>>>> Please describe the design in detail, including answers to the >>>>>>>>> following questin.. >>>>>>>>> >>>>>>>>> 1. Were all districts in the target population included in the sample? >>>>>>>>> Or, were districts sampled? >>>>>>>>> >>>>>>>>> 2. Are the final sampling weights the probability sampling weights? Or >>>>>>>>> was there adjustment to the probabilithy weights (post-stratification, >>>>>>>>> "raking") áso that the sample results will better reflect population >>>>>>>>> census proportions? If the weights are so adjusted, áare the original >>>>>>>>> sampling weights available to you? >>>>>>>>> >>>>>>>>> >>>>>>>>> Steve >>>>>>>>> >>>>>>>>> Steven J. Samuels >>>>>>>>> sjsamuels@gmail.com >>>>>>>>> 18 Cantine's Island >>>>>>>>> Saugerties NY 12477 >>>>>>>>> USA >>>>>>>>> Voice: 845-246-0774 >>>>>>>>> Fax:á á 206-202-4783 >>>>>>>>> >>>>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>>>>>>> > Arka wrote: >>>>>>>>> > "Now I want to estimate the number of workers >>>>>>>>> > belonging to each industry in a particular district" >>>>>>>>> > >>>>>>>>> > A quite trivial example about Arka's issue may be the following one (set >>>>>>>>> > aside survey technicalities): >>>>>>>>> > >>>>>>>>> > ---------------------code begins------------------------------------ >>>>>>>>> > drop _all >>>>>>>>> > set obs 100 >>>>>>>>> > g Workers=_n >>>>>>>>> > g District="East" in 1/50 >>>>>>>>> > replace District="West" in 51/100 >>>>>>>>> > g Industry="Concrete" in 1/30 >>>>>>>>> > replace áIndustry="Steel" in 31/100 >>>>>>>>> > g A= 1 if áDistrict=="East" & áIndustry=="Steel" >>>>>>>>> > g B= 1 if áDistrict=="West" & áIndustry=="Steel" >>>>>>>>> > g C= 1 if áDistrict=="East" & áIndustry=="Concrete" >>>>>>>>> > ---------------------code ends------------------------------------ >>>>>>>>> > >>>>>>>>> > HTH and Kind Regards, >>>>>>>>> > Carlo >>>>>>>>> > -----Messaggio originale----- >>>>>>>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>>>>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>>>>>>> > Chaudhuri >>>>>>>>> > Inviato: mercoledý 15 settembre 2010 9.24 >>>>>>>>> > A: statalist@hsphsun2.harvard.edu >>>>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>>>>>>> > district >>>>>>>>> > >>>>>>>>> > Dear All, >>>>>>>>> > áááááá I have a data set which has information at the individual >>>>>>>>> > level.I have variables which record the district of residence of the >>>>>>>>> > individual, the industry of employment of the individual and other >>>>>>>>> > demographic characterstics.The data set also comes with weights which >>>>>>>>> > represents the probability that a particular household is included in >>>>>>>>> > the sample.Thus all individuals belonging to a particular household >>>>>>>>> > get the same weight.Now I want to estimate the number of workers >>>>>>>>> > belonging to each industry in a particular district.Could anyone >>>>>>>>> > please advice on the correct stata code that I should write to get my >>>>>>>>> > desired estimates?Also I would be grateful if somebody could advice me >>>>>>>>> > on the possible biases that might affect my estimates at the >>>>>>>>> > industry-district level.I would really appreciate any help in this >>>>>>>>> > regard.Thanks >>>>>>>>> > >>>>>>>>> > Regards, >>>>>>>>> > Arka >>>>>>>>> > -- >>>>>>>>> > Arka Roy Chaudhuri >>>>>>>>> > PhD Student >>>>>>>>> > University of British Columbia >>>>>>>>> > 997-1873 East Mall >>>>>>>>> > Vancouver >>>>>>>>> > Canada >>>>>>>>> > Ph: +1 (604) 349-8283 >>>>>>>>> > Email: gabuisi@gmail.com >>>>>>>>> > >>>>>>>>> > * >>>>>>>>> > * á For searches and help try: >>>>>>>>> > * á http://www.stata.com/help.cgi?search >>>>>>>>> > * á http://www.stata.com/support/statalist/faq >>>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/ >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > * >>>>>>>>> > * á For searches and help try: >>>>>>>>> > * á http://www.stata.com/help.cgi?search >>>>>>>>> > * á http://www.stata.com/support/statalist/faq >>>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/ >>>>>>>>> > >>>>>>>>> >>>>>>>>> * >>>>>>>>> * á For searches and help try: >>>>>>>>> * á http://www.stata.com/help.cgi?search >>>>>>>>> * á http://www.stata.com/support/statalist/faq >>>>>>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>>>>> >>>>>>>> * >>>>>>>> * á For searches and help try: >>>>>>>> * á http://www.stata.com/help.cgi?search >>>>>>>> * á http://www.stata.com/support/statalist/faq >>>>>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>>>>> >>>>>>> >>>>>> >>>>>> * >>>>>> * á For searches and help try: >>>>>> * á http://www.stata.com/help.cgi?search >>>>>> * á http://www.stata.com/support/statalist/faq >>>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>>> >>>>> >>>>> * >>>>> * á For searches and help try: >>>>> * á http://www.stata.com/help.cgi?search >>>>> * á http://www.stata.com/support/statalist/faq >>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>> >>>>> * >>>>> * á For searches and help try: >>>>> * á http://www.stata.com/help.cgi?search >>>>> * á http://www.stata.com/support/statalist/faq >>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>> >>>> >>>> * >>>> * á For searches and help try: >>>> * á http://www.stata.com/help.cgi?search >>>> * á http://www.stata.com/support/statalist/faq >>>> * á http://www.ats.ucla.edu/stat/stata/ >>>> >>> >>> * >>> * á For searches and help try: >>> * á http://www.stata.com/help.cgi?search >>> * á http://www.stata.com/support/statalist/faq >>> * á http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Estimating the number of workers in each industry in each district***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**st: RE: xtivreg, xtiveg2 and time invariant excluded instruments** - Next by Date:
**st: re: RE: error while doing IV** - Previous by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Next by thread:
**st: asclogit with endogenous binary variables** - Index(es):