RE: st: RE: Build groups with the same first two numbers of SIC

 From "Nick Cox" To Subject RE: st: RE: Build groups with the same first two numbers of SIC Date Thu, 19 Mar 2009 17:35:45 -0000

```Quite so. It wouldn't be 1 2 3 4, but that shouldn't matter much, if at all.

Nick
n.j.cox@durham.ac.uk

Michael I. Lichter

Nick's method looks good to me, but -sicgroup- can be generated
automatically by

gen sicgroup = int(sic/100)

Nick Cox wrote:

> I can't see any reason whatsoever here for your strategy of separating into smaller datasets and then putting them together.
>
> I'll guess that -sic- is numeric.
>
> You need a variable, say -sicgroup-, that you put together the hard way
>
> gen sicgroup = 2
> replace sicgroup = 1 if sic == 3674
> replace sicgroup = 3 if sic == 3861
> replace sicgroup = 4 if sic == 4213
>
> Then it's just
>
> egen mean = mean(ret), by(sicgroup date)

Hua Pan

> I have a list of firms with four digit sic code, permno (identify Nr. for firms), date and return and wish to get daily mean return within the group, which has the same first two numbers of SIC code.
>
> My Dataset look like this:
>
> sic     permno      date          ret
> …
> 3674   10012    5.Jan.2004
>         10012    6.Jan.2004
>        10012    7.Jan.2004
> 3674   10259    5.Jan.2004
>        10259    6.Jan.2004
>        10259    7.Jan.2004
> 3674   10299
>        10299
>        10299
> 3674   10302
>        10302
>        10302
> -----------------------------------------------------------------
> 3714   10667
>        10667
>        10667
> ------------------------------------------------------------------
> 3728   10145
>        10145
>        10145
> ------------------------------------------------------------------
> 3861   10163
>        10163
>        10163
> ------------------------------------------------------------------
> 4213   10379
>        10379
>        10379
> 4213   10649
>        10649
>        10649
>
>
> At first I want to build several groups. Firms within each group have the same character: the first two numbers of their SIC codes are identical. For the example above
>         sic               permno
> Group1: 3674              10012, 10259, 10299, 10302
> Group2: 3714, 3728        10667, 10145
> Group3: 3861              10163
> Group4: 4213              10379, 10649
>
> Then I wish to get mean daily return for each group.
>
> So I just tried to separate the big dataset into several sub dataset, and calculate daily mean return for each of them. Then I get the sub datasets together with “append”. For the first step, I did:
>
> . local n=3600
> . while `n' <4300 {
>   2. use "D:\sic.dta", clear
>   3. keep if sic >=`n' & sic < `n'+100
>   4. by date, sort: egen meanret=mean(ret)
>   5. save "D:\ph\sic\sic_`n'.dta"
>   6. local n=`n'+100
>   7. }
>
>
> It is successful for 36xx, 37xx. But when `n’== 3900, all observations in the complete file “D:\sic.dta" are deleted, because none of them meet the requirement: sic>=3900 & sic < 4000, so there is an error:
>
> (y observations deleted)
>
> y is the number of all the observations in complete dataset.
>
> There are a huge number of observations, so I can’t do it one by one. Has anyone here an idea to solve this problem? Or some easier methods to generate such groups (I’ve also tried, but failed to get it ), so I can get the daily mean return with:
>
> by group date, sort: egen meanret=mean(ret)
>
> Btw, I’m using stata 10.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```