Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE |

Date |
Mon, 27 Sep 2010 17:01:27 -0400 |

Please resend with a new subject heading. I have no expertise in this area and those who do will not necessarily see your post. On Mon, Sep 27, 2010 at 4:16 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: > Dear Steve, > Now I am having some problems in estimating a IV regression.It would be > great if you could please help me with my problem. > > I have the following variables in my data set:districtid, average > residual gender wage gap in a district(avggap), scaled district > tariff(district_tariff_scaled), unscaled district > tariffs(district_tariff_unscaled), a set of district > dummies(_Idistricti*), a time dummy since I have two time > periods(time), district population(district_popn). I am interested in > looking at the effect of scaled district tariffs on the average > residual gender wage gap using the unscaled district tariffs as > instruments for district tariffs. I run the following 3 regressions(I > use the district population as weights and cluster over districts to > correct for standard errors):- > > 1)regress avggap district_tariff_scaled time _Idistricti* > [aweight=district_popn],cluster(districtid) > In this regression I look at the structural equation i.e the effect of > scaled district tariffs on average gender wage gap. I do not get any > error in this case. > > 2)regress avggap district_tariff_unscaled time _Idistricti* > [aweight=district_popn],cluster(districtid) > In this regression I l look at the reduced form relationship between > unscaled tariffs and the average gender wage gap. I do not get any > error in this case. > > 3)ivregress 2sls avggap (district_tariff_scaled > =district_tariff_unscaled) time _Idistricti* > [aweight=district_popn],cluster(districtid) > This is the equation that I have problem estimating.I use the unscaled > tariffs as instruments for the scaled tariffs.However Stata gives me > the following error: > > ivregress 2sls avggap (district_tariff_scaled > =district_tariff_unscaled) time _Idistricti* > [aweight=district_popn],cluster(districtid) > (sum of wgt is 0.0000e+00) > no observations > r(2000); > > Surprisingly if I estimate the third equation without clustering over > the districts Stata gives me results without any error.I tried using > the vce option instead of the cluster option but I get the same error. > I do not understand why clustering over districts does not create any > problem in the estimation of the first two equations while it returns > an error while I am estimating the 3rd equation. Since I am using a > difference in difference approach it is essential that I cluster over > district. I am using Stata11. > > I will be really grateful if you could help me out with this problem.Thanks > > Regards, > Arka > > On Mon, Sep 27, 2010 at 6:18 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >> You are welcome, Arka. áThe 50% RSE criterion I've seen is a worst >> case; 30% would be more believable. >> >> Steve >> >> On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>> Dear Steve, >>> >>> á á Thanks for all your suggestions. I have already ensured that I >>> have adequate number of observations in each district-industry cell. I >>> will also look at the relative standard error criterion.Once again >>> thanks a lot for your help. >>> >>> Regards, >>> Arka >>> >>> >>> >>> >>> áFri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> Well, there will be numbers for up to 196,000 cells. ámany will be >>>> empty because of missing data; I would hesitate to call the remainder >>>> "estimates' áunless the standard errors are reasonable and they were >>>> based on >10 -20 observations in the category. >>>> >>>> I have seen designs in which sum-of-weights estimates were worthless >>>> for estimating population totals, even with large sample sizes. áPPS >>>> designs are less vulnerable to this kind of problem. >>>> >>>> Survey organizations generally have policies for suppressing >>>> estimates based on small sample sizes. Perhaps there is a standard >>>> practice in your field. I suggest that, in each district, you screen >>>> the industries present in the sample for a minimum number of >>>> individuals, say 10-20, and report proper survey estimates, with >>>> standard errors, and sample n's only for those. You can group smaller >>>> industries ágroups to meet these criteria.. The relative standard >>>> error (SE/estimate) x 100% áis another criterion people use for >>>> suppressing estimates, and I've seen áRSE's of 50% used as a maximum. >>>> >>>> Good luck! >>>> >>>> Steve >>>> >>>> Steven J. Samuels >>>> sjsamuels@gmail.com >>>> 18 Cantine's Island >>>> Saugerties NY 12477 >>>> USA >>>> Voice: 845-246-0774 >>>> Fax:á á 206-202-4783 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Arka Roy Chaudhuri <gabuisi@gmail.com> >>>> Date: Fri, Sep 24, 2010 at 4:03 PM >>>> Subject: Re: st: R: Estimating the number of workers in each industry >>>> in each district - flag: Stata 9/2 SE >>>> To: statalist@hsphsun2.harvard.edu >>>> >>>> >>>> Dear Steve, >>>> >>>> á Thanks a lot for all your advice.The problem is that in my dataset >>>> I have about 490 industries and 400 districts. Both industries and >>>> districts come with a code identifying them.I used the following >>>> command to estimate the number of workers in each industry in a >>>> district: >>>> >>>> bysort districtid industryid:egen workers=total(weight) /*here weight >>>> represents the inverse of probability of the household being sampled*/ >>>> duplicates drop districtid industryid,force >>>> keep ádistrictid industryid workers >>>> save"T:\arka\industry_district.dta", >>>> >>>> >>>> Is the above estimation strategy leaving aside the issue of -svyset- >>>> my data? Please advice. >>>> >>>> Arka >>>> >>>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>> My advice about handling household counts of workers was wrong. Do not expand. >>>>> >>>>> Say you have counts for the number of workers in the hh áin three industries >>>>> >>>>> n_agriculture >>>>> n_service >>>>> n_sales >>>>> >>>>> Then you would use do a separate command for each industry, for example: >>>>> ********************************************* >>>>> levelsof district, local(districts) >>>>> foreach x of álocal districts{ >>>>> svy: total n_agriculture if district==`x' >>>>> } >>>>> *********************************************** >>>>> You would use this form rather than an -over()- áor -subpop()- option, >>>>> because districts are sampling strata. >>>>> >>>>> -Steve >>>>> >>>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>> Arka- >>>>>> >>>>>> Based on your description, you would -svyset- your data as follows: >>>>>> >>>>>> Define a variable (call it "psu" for "primary sampling unit") which is >>>>>> the village number (rural sector) or urban block( urban sector) >>>>>> >>>>>> >>>>>> then >>>>>> ******************************************************** >>>>>> svyset psu [pw = your weight], strata(district) >>>>>> *********************************************************** >>>>>> >>>>>> If your data has one line per person, with "industry" categorized >>>>>> >>>>>> then the command for totals might be >>>>>> >>>>>> ***************************************************** >>>>>> svy: tab district industry, count se format(%10.0fc) >>>>>> ***************************************************** >>>>>> >>>>>> If your data has only counts of workers in each industry in each HH, >>>>>> then you should -expand- the data first so that it has one line for >>>>>> each worker in the HH, e.g. >>>>>> >>>>>> ************* >>>>>> expand hhsize >>>>>> ************* >>>>>> >>>>>> (but that might include children, so you will have to take some care) >>>>>> >>>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As >>>>>> you are a student, I suggest that you seek guidance from a faculty >>>>>> member who is experienced in surveys, if not in Stata. (I know that >>>>>> the Department of Statistics at UBC has a survey sampling course). I >>>>>> also suggest that you obtain a text to learn about sampnling, such as >>>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). áI also >>>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and >>>>>> Berglund (2010); it uses Stata almost exclusively for its examples. >>>>>> >>>>>> Best wishes, >>>>>> >>>>>> Steve >>>>>> >>>>>> Steven J. Samuels >>>>>> sjsamuels@gmail.com >>>>>> 18 Cantine's Island >>>>>> Saugerties NY 12477 >>>>>> USA >>>>>> Voice: 845-246-0774 >>>>>> Fax:á á 206-202-4783 >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>>>>> Hi, >>>>>>> áThanks for the help. In my dataset all the districts in the target >>>>>>> population are include. The sampling design is stratified multi-stage >>>>>>> design with the first stage units being villages in the rural sector >>>>>>> and urban blocks in the urban sector. The ultimate stage units (USU) >>>>>>> are households in both the sectors. >>>>>>> >>>>>>> á I only have one set of weights that comes with the data. The >>>>>>> documentation states that the weights represent the probability that >>>>>>> the particular household was included in the sample. áPlease let me >>>>>>> know if I should include any other information. I am really thankful >>>>>>> for all the help. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Arka >>>>>>> >>>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>>>> >>>>>>>> Arka- >>>>>>>> >>>>>>>> I can't answer áwithout more information about the sampling design. >>>>>>>> Please describe the design in detail, including answers to the >>>>>>>> following questin.. >>>>>>>> >>>>>>>> 1. Were all districts in the target population included in the sample? >>>>>>>> Or, were districts sampled? >>>>>>>> >>>>>>>> 2. Are the final sampling weights the probability sampling weights? Or >>>>>>>> was there adjustment to the probabilithy weights (post-stratification, >>>>>>>> "raking") áso that the sample results will better reflect population >>>>>>>> census proportions? If the weights are so adjusted, áare the original >>>>>>>> sampling weights available to you? >>>>>>>> >>>>>>>> >>>>>>>> Steve >>>>>>>> >>>>>>>> Steven J. Samuels >>>>>>>> sjsamuels@gmail.com >>>>>>>> 18 Cantine's Island >>>>>>>> Saugerties NY 12477 >>>>>>>> USA >>>>>>>> Voice: 845-246-0774 >>>>>>>> Fax:á á 206-202-4783 >>>>>>>> >>>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>>>>>> > Arka wrote: >>>>>>>> > "Now I want to estimate the number of workers >>>>>>>> > belonging to each industry in a particular district" >>>>>>>> > >>>>>>>> > A quite trivial example about Arka's issue may be the following one (set >>>>>>>> > aside survey technicalities): >>>>>>>> > >>>>>>>> > ---------------------code begins------------------------------------ >>>>>>>> > drop _all >>>>>>>> > set obs 100 >>>>>>>> > g Workers=_n >>>>>>>> > g District="East" in 1/50 >>>>>>>> > replace District="West" in 51/100 >>>>>>>> > g Industry="Concrete" in 1/30 >>>>>>>> > replace áIndustry="Steel" in 31/100 >>>>>>>> > g A= 1 if áDistrict=="East" & áIndustry=="Steel" >>>>>>>> > g B= 1 if áDistrict=="West" & áIndustry=="Steel" >>>>>>>> > g C= 1 if áDistrict=="East" & áIndustry=="Concrete" >>>>>>>> > ---------------------code ends------------------------------------ >>>>>>>> > >>>>>>>> > HTH and Kind Regards, >>>>>>>> > Carlo >>>>>>>> > -----Messaggio originale----- >>>>>>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>>>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>>>>>> > Chaudhuri >>>>>>>> > Inviato: mercoledý 15 settembre 2010 9.24 >>>>>>>> > A: statalist@hsphsun2.harvard.edu >>>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>>>>>> > district >>>>>>>> > >>>>>>>> > Dear All, >>>>>>>> > áááááá I have a data set which has information at the individual >>>>>>>> > level.I have variables which record the district of residence of the >>>>>>>> > individual, the industry of employment of the individual and other >>>>>>>> > demographic characterstics.The data set also comes with weights which >>>>>>>> > represents the probability that a particular household is included in >>>>>>>> > the sample.Thus all individuals belonging to a particular household >>>>>>>> > get the same weight.Now I want to estimate the number of workers >>>>>>>> > belonging to each industry in a particular district.Could anyone >>>>>>>> > please advice on the correct stata code that I should write to get my >>>>>>>> > desired estimates?Also I would be grateful if somebody could advice me >>>>>>>> > on the possible biases that might affect my estimates at the >>>>>>>> > industry-district level.I would really appreciate any help in this >>>>>>>> > regard.Thanks >>>>>>>> > >>>>>>>> > Regards, >>>>>>>> > Arka >>>>>>>> > -- >>>>>>>> > Arka Roy Chaudhuri >>>>>>>> > PhD Student >>>>>>>> > University of British Columbia >>>>>>>> > 997-1873 East Mall >>>>>>>> > Vancouver >>>>>>>> > Canada >>>>>>>> > Ph: +1 (604) 349-8283 >>>>>>>> > Email: gabuisi@gmail.com >>>>>>>> > >>>>>>>> > * >>>>>>>> > * á For searches and help try: >>>>>>>> > * á http://www.stata.com/help.cgi?search >>>>>>>> > * á http://www.stata.com/support/statalist/faq >>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/ >>>>>>>> > >>>>>>>> > >>>>>>>> > * >>>>>>>> > * á For searches and help try: >>>>>>>> > * á http://www.stata.com/help.cgi?search >>>>>>>> > * á http://www.stata.com/support/statalist/faq >>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/ >>>>>>>> > >>>>>>>> >>>>>>>> * >>>>>>>> * á For searches and help try: >>>>>>>> * á http://www.stata.com/help.cgi?search >>>>>>>> * á http://www.stata.com/support/statalist/faq >>>>>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>>>> >>>>>>> * >>>>>>> * á For searches and help try: >>>>>>> * á http://www.stata.com/help.cgi?search >>>>>>> * á http://www.stata.com/support/statalist/faq >>>>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>>>> >>>>>> >>>>> >>>>> * >>>>> * á For searches and help try: >>>>> * á http://www.stata.com/help.cgi?search >>>>> * á http://www.stata.com/support/statalist/faq >>>>> * á http://www.ats.ucla.edu/stat/stata/ >>>>> >>>> >>>> * >>>> * á For searches and help try: >>>> * á http://www.stata.com/help.cgi?search >>>> * á http://www.stata.com/support/statalist/faq >>>> * á http://www.ats.ucla.edu/stat/stata/ >>>> >>>> * >>>> * á For searches and help try: >>>> * á http://www.stata.com/help.cgi?search >>>> * á http://www.stata.com/support/statalist/faq >>>> * á http://www.ats.ucla.edu/stat/stata/ >>>> >>> >>> * >>> * á For searches and help try: >>> * á http://www.stata.com/help.cgi?search >>> * á http://www.stata.com/support/statalist/faq >>> * á http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * á For searches and help try: >> * á http://www.stata.com/help.cgi?search >> * á http://www.stata.com/support/statalist/faq >> * á http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**References**:**st: Estimating the number of workers in each industry in each district***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE***From:*Arka Roy Chaudhuri <gabuisi@gmail.com>

- Prev by Date:
**st: foreach and levels of string variable** - Next by Date:
**st: RE: foreach and levels of string variable** - Previous by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Next by thread:
**Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE** - Index(es):