Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE

 From Arka Roy Chaudhuri To statalist@hsphsun2.harvard.edu Subject Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE Date Mon, 27 Sep 2010 13:16:46 -0700

```Dear Steve,
Now I am having some problems in estimating a IV regression.It would be

I have the following variables in my data set:districtid, average
residual gender wage gap in a district(avggap), scaled district
tariff(district_tariff_scaled), unscaled district
tariffs(district_tariff_unscaled), a set of district
dummies(_Idistricti*), a time dummy since I have two time
periods(time), district population(district_popn). I am interested in
looking at the effect of scaled district tariffs on the average
residual gender wage gap using the unscaled district tariffs as
instruments for district tariffs. I run the following 3 regressions(I
use the district population as weights and cluster over districts to
correct for standard errors):-

1)regress avggap district_tariff_scaled  time _Idistricti*
[aweight=district_popn],cluster(districtid)
In this regression I look at the structural equation i.e the effect of
scaled district tariffs on average gender wage gap. I do not get any
error in this case.

2)regress avggap district_tariff_unscaled  time _Idistricti*
[aweight=district_popn],cluster(districtid)
In this regression I l look at the reduced form relationship between
unscaled tariffs and the average gender wage gap. I do not get any
error in this case.

3)ivregress 2sls avggap (district_tariff_scaled
=district_tariff_unscaled) time _Idistricti*
[aweight=district_popn],cluster(districtid)
This is the equation that I have problem estimating.I use the unscaled
tariffs as instruments for the scaled tariffs.However Stata gives me
the following error:

ivregress 2sls avggap (district_tariff_scaled
=district_tariff_unscaled) time _Idistricti*
[aweight=district_popn],cluster(districtid)
(sum of wgt is   0.0000e+00)
no observations
r(2000);

Surprisingly if I estimate the third equation without clustering over
the districts Stata gives me results without any error.I tried using
the vce option instead of the cluster option but I get the same error.
I do not understand why clustering over districts does not create any
problem in the estimation of the first two equations while it returns
an error while I am estimating the 3rd equation. Since I am using a
difference in difference approach it is essential that I cluster over
district. I am using Stata11.

I will be really grateful if you could help me out with this problem.Thanks

Regards,
Arka

On Mon, Sep 27, 2010 at 6:18 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
> You are welcome, Arka.  The 50% RSE criterion I've seen is a worst
> case; 30% would be more believable.
>
> Steve
>
> On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote:
>> Dear Steve,
>>
>>     Thanks for all your suggestions. I have already ensured that I
>> have adequate number of observations in each district-industry cell. I
>> will also look at the relative standard error criterion.Once again
>> thanks a lot for your help.
>>
>> Regards,
>> Arka
>>
>>
>>
>>
>>  Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>> Well, there will be numbers for up to 196,000 cells.  many will be
>>> empty because of missing data; I would hesitate to call the remainder
>>> "estimates'  unless the standard errors are reasonable and they were
>>> based on >10 -20 observations in the category.
>>>
>>> I have seen designs in which sum-of-weights estimates were worthless
>>> for estimating population totals, even with large sample sizes.  PPS
>>> designs are less vulnerable to this kind of problem.
>>>
>>> Survey organizations generally have policies for suppressing
>>> estimates based on small sample sizes. Perhaps there is a standard
>>> practice in your field. I suggest that, in each district, you screen
>>> the industries present in the sample for a minimum number of
>>> individuals, say 10-20, and report proper survey estimates, with
>>> standard errors, and sample n's only for those. You can group smaller
>>> industries  groups to meet these criteria.. The relative standard
>>> error (SE/estimate) x 100%  is another criterion people use for
>>> suppressing estimates, and I've seen  RSE's of 50% used as a maximum.
>>>
>>> Good luck!
>>>
>>> Steve
>>>
>>> Steven J. Samuels
>>> sjsamuels@gmail.com
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Arka Roy Chaudhuri <gabuisi@gmail.com>
>>> Date: Fri, Sep 24, 2010 at 4:03 PM
>>> Subject: Re: st: R: Estimating the number of workers in each industry
>>> in each district - flag: Stata 9/2 SE
>>> To: statalist@hsphsun2.harvard.edu
>>>
>>>
>>> Dear Steve,
>>>
>>>   Thanks a lot for all your advice.The problem is that in my dataset
>>> I have about 490 industries and 400 districts. Both industries and
>>> districts come with a code identifying them.I used the following
>>> command to estimate the number of workers in each industry in a
>>> district:
>>>
>>> bysort districtid industryid:egen workers=total(weight) /*here weight
>>> represents the inverse of probability of the household being sampled*/
>>> duplicates drop districtid industryid,force
>>> keep  districtid industryid workers
>>> save"T:\arka\industry_district.dta",
>>>
>>>
>>> Is the above estimation strategy leaving aside the issue of -svyset-
>>>
>>> Arka
>>>
>>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>>> My advice about handling household counts of workers was wrong. Do not expand.
>>>>
>>>> Say you have counts for the number of workers in the hh  in three industries
>>>>
>>>> n_agriculture
>>>> n_service
>>>> n_sales
>>>>
>>>> Then you would use do a separate command for each industry, for example:
>>>> *********************************************
>>>> levelsof district, local(districts)
>>>> foreach x of  local districts{
>>>> svy: total n_agriculture if district==`x'
>>>> }
>>>> ***********************************************
>>>> You would use this form rather than an -over()-  or -subpop()- option,
>>>> because districts are sampling strata.
>>>>
>>>> -Steve
>>>>
>>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>>>> Arka-
>>>>>
>>>>> Based on your description, you would -svyset- your data as follows:
>>>>>
>>>>> Define a variable (call it "psu" for "primary sampling unit") which is
>>>>> the village number (rural sector) or urban block( urban sector)
>>>>>
>>>>>
>>>>> then
>>>>> ********************************************************
>>>>> svyset psu [pw = your weight], strata(district)
>>>>> ***********************************************************
>>>>>
>>>>> If your data has one line per person, with "industry" categorized
>>>>>
>>>>> then the command for totals might be
>>>>>
>>>>> *****************************************************
>>>>> svy: tab district industry, count se format(%10.0fc)
>>>>> *****************************************************
>>>>>
>>>>> If your data has only counts of workers in each industry in each HH,
>>>>> then you should -expand- the data first so that it has one line for
>>>>> each worker in the HH, e.g.
>>>>>
>>>>> *************
>>>>> expand hhsize
>>>>> *************
>>>>>
>>>>> (but that might include children, so you will have to take some care)
>>>>>
>>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>>>>> you are a student, I suggest that you seek guidance from a faculty
>>>>> member who is experienced in surveys, if not in Stata. (I know that
>>>>> the Department of Statistics at UBC has a survey sampling course). I
>>>>> also suggest that you obtain a text to learn about sampnling, such as
>>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009).  I also
>>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>>>>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> Steve
>>>>>
>>>>> Steven J. Samuels
>>>>> sjsamuels@gmail.com
>>>>> 18 Cantine's Island
>>>>> Saugerties NY 12477
>>>>> USA
>>>>> Voice: 845-246-0774
>>>>> Fax:    206-202-4783
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote:
>>>>>> Hi,
>>>>>>  Thanks for the help. In my dataset all the districts in the target
>>>>>> population are include. The sampling design is stratified multi-stage
>>>>>> design with the first stage units being villages in the rural sector
>>>>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>>>>> are households in both the sectors.
>>>>>>
>>>>>>   I only have one set of weights that comes with the data. The
>>>>>> documentation states that the weights represent the probability that
>>>>>> the particular household was included in the sample.  Please let me
>>>>>> know if I should include any other information. I am really thankful
>>>>>> for all the help.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Arka
>>>>>>
>>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>>>>>>
>>>>>>> Arka-
>>>>>>>
>>>>>>> following questin..
>>>>>>>
>>>>>>> 1. Were all districts in the target population included in the sample?
>>>>>>> Or, were districts sampled?
>>>>>>>
>>>>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>>>>> "raking")  so that the sample results will better reflect population
>>>>>>> census proportions? If the weights are so adjusted,  are the original
>>>>>>> sampling weights available to you?
>>>>>>>
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> Steven J. Samuels
>>>>>>> sjsamuels@gmail.com
>>>>>>> 18 Cantine's Island
>>>>>>> Saugerties NY 12477
>>>>>>> USA
>>>>>>> Voice: 845-246-0774
>>>>>>> Fax:    206-202-4783
>>>>>>>
>>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote:
>>>>>>> > Arka wrote:
>>>>>>> > "Now I want to estimate the number of workers
>>>>>>> > belonging to each industry in a particular district"
>>>>>>> >
>>>>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>>>>> > aside survey technicalities):
>>>>>>> >
>>>>>>> > ---------------------code begins------------------------------------
>>>>>>> > drop _all
>>>>>>> > set obs 100
>>>>>>> > g Workers=_n
>>>>>>> > g District="East" in 1/50
>>>>>>> > replace District="West" in 51/100
>>>>>>> > g Industry="Concrete" in 1/30
>>>>>>> > replace  Industry="Steel" in 31/100
>>>>>>> > g A= 1 if  District=="East" &  Industry=="Steel"
>>>>>>> > g B= 1 if  District=="West" &  Industry=="Steel"
>>>>>>> > g C= 1 if  District=="East" &  Industry=="Concrete"
>>>>>>> > ---------------------code ends------------------------------------
>>>>>>> >
>>>>>>> > HTH and Kind Regards,
>>>>>>> > Carlo
>>>>>>> > -----Messaggio originale-----
>>>>>>> > Da: owner-statalist@hsphsun2.harvard.edu
>>>>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy
>>>>>>> > Chaudhuri
>>>>>>> > Inviato: mercoledì 15 settembre 2010 9.24
>>>>>>> > A: statalist@hsphsun2.harvard.edu
>>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>>>>> > district
>>>>>>> >
>>>>>>> > Dear All,
>>>>>>> >        I have a data set which has information at the individual
>>>>>>> > level.I have variables which record the district of residence of the
>>>>>>> > individual, the industry of employment of the individual and other
>>>>>>> > demographic characterstics.The data set also comes with weights which
>>>>>>> > represents the probability that a particular household is included in
>>>>>>> > the sample.Thus all individuals belonging to a particular household
>>>>>>> > get the same weight.Now I want to estimate the number of workers
>>>>>>> > belonging to each industry in a particular district.Could anyone
>>>>>>> > please advice on the correct stata code that I should write to get my
>>>>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>>>>> > on the possible biases that might affect my estimates at the
>>>>>>> > industry-district level.I would really appreciate any help in this
>>>>>>> > regard.Thanks
>>>>>>> >
>>>>>>> > Regards,
>>>>>>> > Arka
>>>>>>> > --
>>>>>>> > Arka Roy Chaudhuri
>>>>>>> > PhD Student
>>>>>>> > University of British Columbia
>>>>>>> > 997-1873 East Mall
>>>>>>> > Vancouver
>>>>>>> > Ph: +1 (604) 349-8283
>>>>>>> > Email: gabuisi@gmail.com
>>>>>>> >
>>>>>>> > *
>>>>>>> > *   For searches and help try:
>>>>>>> > *   http://www.stata.com/help.cgi?search
>>>>>>> > *   http://www.stata.com/support/statalist/faq
>>>>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>>>>> >
>>>>>>> >
>>>>>>> > *
>>>>>>> > *   For searches and help try:
>>>>>>> > *   http://www.stata.com/help.cgi?search
>>>>>>> > *   http://www.stata.com/support/statalist/faq
>>>>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>>>>> >
>>>>>>>
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/statalist/faq
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/statalist/faq
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```