Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: unique value count in several variables


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: unique value count in several variables
Date   Sun, 19 Jun 2005 10:09:01 +0100

I do not feel confused, but I did not grasp that 
that was what you wanted. I can't see a simpler 
way than this. For the benefit of any watching, 
the -egen- function -nvals()- comes from -egenmore-
on SSC. A footnote gives code using official Stata 
only. 

. rename psic Kpsic

. rename ssic Kssic 

. reshape long K , string i(Gvkey year subno) 
(note: j = psic ssic)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      12
Number of variables                   5   ->       5
j variable (2 values)                     ->   _j
xij variables:
                            Kpsic Kssic   ->   K
-----------------------------------------------------------------------------

. l

     +------------------------------------+
     | Gvkey   year   subno     _j      K |
     |------------------------------------|
  1. |  1223   1999       1   psic   4767 |
  2. |  1223   1999       1   ssic   4743 |
  3. |  1223   1999       2   psic   4767 |
  4. |  1223   1999       2   ssic   4763 |
  5. |  1223   1999       3   psic   4757 |
     |------------------------------------|
  6. |  1223   1999       3   ssic   4767 |
  7. |  1223   1999       4   psic   4767 |
  8. |  1223   1999       4   ssic   4753 |
  9. |  1223   1999       5   psic   4777 |
 10. |  1223   1999       5   ssic   4787 |
     |------------------------------------|
 11. |  1223   1999       6   psic   4767 |
 12. |  1223   1999       6   ssic   4743 |
     +------------------------------------+

. egen nvals = nvals(K), by(Gvkey year) 

. l

     +--------------------------------------------+
     | Gvkey   year   subno     _j      K   nvals |
     |--------------------------------------------|
  1. |  1223   1999       1   psic   4767       7 |
  2. |  1223   1999       1   ssic   4743       7 |
  3. |  1223   1999       2   psic   4767       7 |
  4. |  1223   1999       2   ssic   4763       7 |
  5. |  1223   1999       3   psic   4757       7 |
     |--------------------------------------------|
  6. |  1223   1999       3   ssic   4767       7 |
  7. |  1223   1999       4   psic   4767       7 |
  8. |  1223   1999       4   ssic   4753       7 |
  9. |  1223   1999       5   psic   4777       7 |
 10. |  1223   1999       5   ssic   4787       7 |
     |--------------------------------------------|
 11. |  1223   1999       6   psic   4767       7 |
 12. |  1223   1999       6   ssic   4743       7 |
     +--------------------------------------------+

. reshape wide
(note: j = psic ssic)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       12   ->       6
Number of variables                   6   ->       6
j variable (2 values)                _j   ->   (dropped)
xij variables:
                                      K   ->   Kpsic Kssic
-----------------------------------------------------------------------------

. renpfix K 

. l

     +--------------------------------------------+
     | Gvkey   year   subno   psic   ssic   nvals |
     |--------------------------------------------|
  1. |  1223   1999       1   4767   4743       7 |
  2. |  1223   1999       2   4767   4763       7 |
  3. |  1223   1999       3   4757   4767       7 |
  4. |  1223   1999       4   4767   4753       7 |
  5. |  1223   1999       5   4777   4787       7 |
     |--------------------------------------------|
  6. |  1223   1999       6   4767   4743       7 |
     +--------------------------------------------+

Perhaps we should add this example to the webpage c
cited, by Gary Longton and myself. 

Nick 
[email protected] 

rename psic Kpsic
rename ssic Kssic 
reshape long K , string i(Gvkey year subno) 
l
bysort Gvkey year K : gen nvals = _n == 1 
by Gvkey year : replace nvals = sum(nvals) 
by Gvkey year : replace nvals = nvals[_N] 
sort Gvkey year subno 
l
reshape wide
renpfix K 
l

Wanli Zhao

> Thanks, Nick. I looked into the suggestions and I think I might have
> confused you on my problem. My panel data is like this:
> Gvkey  psic  ssic  year  subno
> 1223   4767  4743  1999  1
> 1223   4767  4763  1999  2
> 1223   4757  4767  1999  3
> 1223   4767  4753  1999  4
> 1223   4777  4787  1999  5
> 1223   4767  4743  1999  6
> 
> Using command unique, I can count the distinct values of psic 
> and ssic by
> gvkey by year. So for psic it's 3 and for ssic it's 5. what I 
> want is to
> count the distinct values of both psic and ssic by gvkey by 
> year. In this
> case, it's 7 (4767, 4757, 4777, 4743, 4763, 4753, 4787). How 
> to generate a
> new variable for my purpose? Hope I'm clear now. Pls help.
 
Nick Cox
 
> By "unique" here I think you mean "distinct". 
> 
> Try -groups- from SSC. Or -egen, group()- and then tabulate. 

Wanli Zhao
  
> > I have a simple question but got stuck on a simple solution. 
> > I have a panel
> > and let's say cross-section id is gvkey and time id is year. 
> > There are two
> > variables, say, primary sic and secondary sic. My aim to count the 
> > unique value of sic in both variables by gvkey by year. I know the 
> > 'by' thing is straightforward but is there a quick solution 
> to count 
> > the unique observation in both variables? I know the 
> commands such as 
> > unique, distinct and egenmore nvals. They work perfect for a single 
> > variable.
> > Also, on the
> > webpage there is a explanation of the unique combination of two 
> > variables and how to count that. I guess mine is different. 
> Your help 
> > is appreciated.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index