Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE

From	Arka Roy Chaudhuri <[email protected]>
To	[email protected]
Subject	Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date	Mon, 27 Sep 2010 16:14:13 -0700

Hi,
  My apologies for posting a different question on the same thread.
Thanks for pointing out the mistake.

Regards,
Arka

On Mon, Sep 27, 2010 at 2:01 PM, Steve Samuels <[email protected]> wrote:
> Please resend with a new subject heading.  I have no expertise in this
> area and those who do will not necessarily see your post.
>
> On Mon, Sep 27, 2010 at 4:16 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>> Dear Steve,
>>  Now I am having some problems in estimating a IV regression.It would be
>> great if you could please help me with my problem.
>>
>> I have the following variables in my data set:districtid, average
>> residual gender wage gap in a district(avggap), scaled district
>> tariff(district_tariff_scaled), unscaled district
>> tariffs(district_tariff_unscaled), a set of district
>> dummies(_Idistricti*), a time dummy since I have two time
>> periods(time), district population(district_popn). I am interested in
>> looking at the effect of scaled district tariffs on the average
>> residual gender wage gap using the unscaled district tariffs as
>> instruments for district tariffs. I run the following 3 regressions(I
>> use the district population as weights and cluster over districts to
>> correct for standard errors):-
>>
>> 1)regress avggap district_tariff_scaled  time _Idistricti*
>> [aweight=district_popn],cluster(districtid)
>> In this regression I look at the structural equation i.e the effect of
>> scaled district tariffs on average gender wage gap. I do not get any
>> error in this case.
>>
>> 2)regress avggap district_tariff_unscaled  time _Idistricti*
>> [aweight=district_popn],cluster(districtid)
>> In this regression I l look at the reduced form relationship between
>> unscaled tariffs and the average gender wage gap. I do not get any
>> error in this case.
>>
>> 3)ivregress 2sls avggap (district_tariff_scaled
>> =district_tariff_unscaled) time _Idistricti*
>> [aweight=district_popn],cluster(districtid)
>> This is the equation that I have problem estimating.I use the unscaled
>> tariffs as instruments for the scaled tariffs.However Stata gives me
>> the following error:
>>
>> ivregress 2sls avggap (district_tariff_scaled
>> =district_tariff_unscaled) time _Idistricti*
>> [aweight=district_popn],cluster(districtid)
>> (sum of wgt is   0.0000e+00)
>> no observations
>> r(2000);
>>
>> Surprisingly if I estimate the third equation without clustering over
>> the districts Stata gives me results without any error.I tried using
>> the vce option instead of the cluster option but I get the same error.
>> I do not understand why clustering over districts does not create any
>> problem in the estimation of the first two equations while it returns
>> an error while I am estimating the 3rd equation. Since I am using a
>> difference in difference approach it is essential that I cluster over
>> district. I am using Stata11.
>>
>> I will be really grateful if you could help me out with this problem.Thanks
>>
>> Regards,
>> Arka
>>
>> On Mon, Sep 27, 2010 at 6:18 AM, Steve Samuels <[email protected]> wrote:
>>> You are welcome, Arka. áThe 50% RSE criterion I've seen is a worst
>>> case; 30% would be more believable.
>>>
>>> Steve
>>>
>>> On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <[email protected]> wrote:
>>>> Dear Steve,
>>>>
>>>> á á Thanks for all your suggestions. I have already ensured that I
>>>> have adequate number of observations in each district-industry cell. I
>>>> will also look at the relative standard error criterion.Once again
>>>> thanks a lot for your help.
>>>>
>>>> Regards,
>>>> Arka
>>>>
>>>>
>>>>
>>>>
>>>> áFri, Sep 24, 2010 at 2:27 PM, Steve Samuels <[email protected]> wrote:
>>>>> Well, there will be numbers for up to 196,000 cells. ámany will be
>>>>> empty because of missing data; I would hesitate to call the remainder
>>>>> "estimates' áunless the standard errors are reasonable and they were
>>>>> based on >10 -20 observations in the category.
>>>>>
>>>>> I have seen designs in which sum-of-weights estimates were worthless
>>>>> for estimating population totals, even with large sample sizes. áPPS
>>>>> designs are less vulnerable to this kind of problem.
>>>>>
>>>>> Survey organizations generally have policies for suppressing
>>>>> estimates based on small sample sizes. Perhaps there is a standard
>>>>> practice in your field. I suggest that, in each district, you screen
>>>>> the industries present in the sample for a minimum number of
>>>>> individuals, say 10-20, and report proper survey estimates, with
>>>>> standard errors, and sample n's only for those. You can group smaller
>>>>> industries ágroups to meet these criteria.. The relative standard
>>>>> error (SE/estimate) x 100% áis another criterion people use for
>>>>> suppressing estimates, and I've seen áRSE's of 50% used as a maximum.
>>>>>
>>>>> Good luck!
>>>>>
>>>>> Steve
>>>>>
>>>>> Steven J. Samuels
>>>>> [email protected]
>>>>> 18 Cantine's Island
>>>>> Saugerties NY 12477
>>>>> USA
>>>>> Voice: 845-246-0774
>>>>> Fax:á á 206-202-4783
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Arka Roy Chaudhuri <[email protected]>
>>>>> Date: Fri, Sep 24, 2010 at 4:03 PM
>>>>> Subject: Re: st: R: Estimating the number of workers in each industry
>>>>> in each district - flag: Stata 9/2 SE
>>>>> To: [email protected]
>>>>>
>>>>>
>>>>> Dear Steve,
>>>>>
>>>>> á Thanks a lot for all your advice.The problem is that in my dataset
>>>>> I have about 490 industries and 400 districts. Both industries and
>>>>> districts come with a code identifying them.I used the following
>>>>> command to estimate the number of workers in each industry in a
>>>>> district:
>>>>>
>>>>> bysort districtid industryid:egen workers=total(weight) /*here weight
>>>>> represents the inverse of probability of the household being sampled*/
>>>>> duplicates drop districtid industryid,force
>>>>> keep ádistrictid industryid workers
>>>>> save"T:\arka\industry_district.dta",
>>>>>
>>>>>
>>>>> Is the above estimation strategy leaving aside the issue of -svyset-
>>>>> my data? Please advice.
>>>>>
>>>>> Arka
>>>>>
>>>>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <[email protected]> wrote:
>>>>>> My advice about handling household counts of workers was wrong. Do not expand.
>>>>>>
>>>>>> Say you have counts for the number of workers in the hh áin three industries
>>>>>>
>>>>>> n_agriculture
>>>>>> n_service
>>>>>> n_sales
>>>>>>
>>>>>> Then you would use do a separate command for each industry, for example:
>>>>>> *********************************************
>>>>>> levelsof district, local(districts)
>>>>>> foreach x of álocal districts{
>>>>>> svy: total n_agriculture if district==`x'
>>>>>> }
>>>>>> ***********************************************
>>>>>> You would use this form rather than an -over()- áor -subpop()- option,
>>>>>> because districts are sampling strata.
>>>>>>
>>>>>> -Steve
>>>>>>
>>>>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <[email protected]> wrote:
>>>>>>> Arka-
>>>>>>>
>>>>>>> Based on your description, you would -svyset- your data as follows:
>>>>>>>
>>>>>>> Define a variable (call it "psu" for "primary sampling unit") which is
>>>>>>> the village number (rural sector) or urban block( urban sector)
>>>>>>>
>>>>>>>
>>>>>>> then
>>>>>>> ********************************************************
>>>>>>> svyset psu [pw = your weight], strata(district)
>>>>>>> ***********************************************************
>>>>>>>
>>>>>>> If your data has one line per person, with "industry" categorized
>>>>>>>
>>>>>>> then the command for totals might be
>>>>>>>
>>>>>>> *****************************************************
>>>>>>> svy: tab district industry, count se format(%10.0fc)
>>>>>>> *****************************************************
>>>>>>>
>>>>>>> If your data has only counts of workers in each industry in each HH,
>>>>>>> then you should -expand- the data first so that it has one line for
>>>>>>> each worker in the HH, e.g.
>>>>>>>
>>>>>>> *************
>>>>>>> expand hhsize
>>>>>>> *************
>>>>>>>
>>>>>>> (but that might include children, so you will have to take some care)
>>>>>>>
>>>>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>>>>>>> you are a student, I suggest that you seek guidance from a faculty
>>>>>>> member who is experienced in surveys, if not in Stata. (I know that
>>>>>>> the Department of Statistics at UBC has a survey sampling course). I
>>>>>>> also suggest that you obtain a text to learn about sampnling, such as
>>>>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). áI also
>>>>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>>>>>>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>>>>>>
>>>>>>> Best wishes,
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> Steven J. Samuels
>>>>>>> [email protected]
>>>>>>> 18 Cantine's Island
>>>>>>> Saugerties NY 12477
>>>>>>> USA
>>>>>>> Voice: 845-246-0774
>>>>>>> Fax:á á 206-202-4783
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>> áThanks for the help. In my dataset all the districts in the target
>>>>>>>> population are include. The sampling design is stratified multi-stage
>>>>>>>> design with the first stage units being villages in the rural sector
>>>>>>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>>>>>>> are households in both the sectors.
>>>>>>>>
>>>>>>>> á I only have one set of weights that comes with the data. The
>>>>>>>> documentation states that the weights represent the probability that
>>>>>>>> the particular household was included in the sample. áPlease let me
>>>>>>>> know if I should include any other information. I am really thankful
>>>>>>>> for all the help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Arka
>>>>>>>>
>>>>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Arka-
>>>>>>>>>
>>>>>>>>> I can't answer áwithout more information about the sampling design.
>>>>>>>>> Please describe the design in detail, including answers to the
>>>>>>>>> following questin..
>>>>>>>>>
>>>>>>>>> 1. Were all districts in the target population included in the sample?
>>>>>>>>> Or, were districts sampled?
>>>>>>>>>
>>>>>>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>>>>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>>>>>>> "raking") áso that the sample results will better reflect population
>>>>>>>>> census proportions? If the weights are so adjusted, áare the original
>>>>>>>>> sampling weights available to you?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>> Steven J. Samuels
>>>>>>>>> [email protected]
>>>>>>>>> 18 Cantine's Island
>>>>>>>>> Saugerties NY 12477
>>>>>>>>> USA
>>>>>>>>> Voice: 845-246-0774
>>>>>>>>> Fax:á á 206-202-4783
>>>>>>>>>
>>>>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <[email protected]> wrote:
>>>>>>>>> > Arka wrote:
>>>>>>>>> > "Now I want to estimate the number of workers
>>>>>>>>> > belonging to each industry in a particular district"
>>>>>>>>> >
>>>>>>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>>>>>>> > aside survey technicalities):
>>>>>>>>> >
>>>>>>>>> > ---------------------code begins------------------------------------
>>>>>>>>> > drop _all
>>>>>>>>> > set obs 100
>>>>>>>>> > g Workers=_n
>>>>>>>>> > g District="East" in 1/50
>>>>>>>>> > replace District="West" in 51/100
>>>>>>>>> > g Industry="Concrete" in 1/30
>>>>>>>>> > replace áIndustry="Steel" in 31/100
>>>>>>>>> > g A= 1 if áDistrict=="East" & áIndustry=="Steel"
>>>>>>>>> > g B= 1 if áDistrict=="West" & áIndustry=="Steel"
>>>>>>>>> > g C= 1 if áDistrict=="East" & áIndustry=="Concrete"
>>>>>>>>> > ---------------------code ends------------------------------------
>>>>>>>>> >
>>>>>>>>> > HTH and Kind Regards,
>>>>>>>>> > Carlo
>>>>>>>>> > -----Messaggio originale-----
>>>>>>>>> > Da: [email protected]
>>>>>>>>> > [mailto:[email protected]] Per conto di Arka Roy
>>>>>>>>> > Chaudhuri
>>>>>>>>> > Inviato: mercoledý 15 settembre 2010 9.24
>>>>>>>>> > A: [email protected]
>>>>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>>>>>>> > district
>>>>>>>>> >
>>>>>>>>> > Dear All,
>>>>>>>>> > áááááá I have a data set which has information at the individual
>>>>>>>>> > level.I have variables which record the district of residence of the
>>>>>>>>> > individual, the industry of employment of the individual and other
>>>>>>>>> > demographic characterstics.The data set also comes with weights which
>>>>>>>>> > represents the probability that a particular household is included in
>>>>>>>>> > the sample.Thus all individuals belonging to a particular household
>>>>>>>>> > get the same weight.Now I want to estimate the number of workers
>>>>>>>>> > belonging to each industry in a particular district.Could anyone
>>>>>>>>> > please advice on the correct stata code that I should write to get my
>>>>>>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>>>>>>> > on the possible biases that might affect my estimates at the
>>>>>>>>> > industry-district level.I would really appreciate any help in this
>>>>>>>>> > regard.Thanks
>>>>>>>>> >
>>>>>>>>> > Regards,
>>>>>>>>> > Arka
>>>>>>>>> > --
>>>>>>>>> > Arka Roy Chaudhuri
>>>>>>>>> > PhD Student
>>>>>>>>> > University of British Columbia
>>>>>>>>> > 997-1873 East Mall
>>>>>>>>> > Vancouver
>>>>>>>>> > Canada
>>>>>>>>> > Ph: +1 (604) 349-8283
>>>>>>>>> > Email: [email protected]
>>>>>>>>> >
>>>>>>>>> > *
>>>>>>>>> > * á For searches and help try:
>>>>>>>>> > * á http://www.stata.com/help.cgi?search
>>>>>>>>> > * á http://www.stata.com/support/statalist/faq
>>>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > *
>>>>>>>>> > * á For searches and help try:
>>>>>>>>> > * á http://www.stata.com/help.cgi?search
>>>>>>>>> > * á http://www.stata.com/support/statalist/faq
>>>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> * á For searches and help try:
>>>>>>>>> * á http://www.stata.com/help.cgi?search
>>>>>>>>> * á http://www.stata.com/support/statalist/faq
>>>>>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>> *
>>>>>>>> * á For searches and help try:
>>>>>>>> * á http://www.stata.com/help.cgi?search
>>>>>>>> * á http://www.stata.com/support/statalist/faq
>>>>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> *
>>>>>> * á For searches and help try:
>>>>>> * á http://www.stata.com/help.cgi?search
>>>>>> * á http://www.stata.com/support/statalist/faq
>>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>> *
>>>>> * á For searches and help try:
>>>>> * á http://www.stata.com/help.cgi?search
>>>>> * á http://www.stata.com/support/statalist/faq
>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> * á For searches and help try:
>>>>> * á http://www.stata.com/help.cgi?search
>>>>> * á http://www.stata.com/support/statalist/faq
>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>> *
>>>> * á For searches and help try:
>>>> * á http://www.stata.com/help.cgi?search
>>>> * á http://www.stata.com/support/statalist/faq
>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> * á For searches and help try:
>>> * á http://www.stata.com/help.cgi?search
>>> * á http://www.stata.com/support/statalist/faq
>>> * á http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Estimating the number of workers in each industry in each district
  - From: Arka Roy Chaudhuri <[email protected]>
- st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: "Carlo Lazzaro" <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Fwd: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Arka Roy Chaudhuri <[email protected]>
- Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
  - From: Steve Samuels <[email protected]>

Prev by Date: st: RE: xtivreg, xtiveg2 and time invariant excluded instruments
Next by Date: st: re: RE: error while doing IV
Previous by thread: Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Next by thread: st: asclogit with endogenous binary variables
Index(es):
- Date
- Thread