Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: unique value count in several variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: RE: unique value count in several variables
Date   Mon, 20 Jun 2005 00:48:06 +0100

My statement about -nvals- existing when you 
tried -reshape- was based on the output you 
quote below. 

I'm pleased that you have solved your problem. 

Nick 
n.j.cox@durham.ac.uk 

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Wanli Zhao
> Sent: 20 June 2005 00:08
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: RE: RE: unique value count in several variables
> 
> 
> Nick,
> I do not have nvals beforehand. I finally modified your 
> "reshape" program as
> I did manually in Eviews and it worked. I just replace the 
> missing values
> with some number (I put 99) and run your program and the 
> nvals shows the
> right number (of course it includes missing value as a 
> distinct sic). So the
> only complication in my case is the missing value needs to be 
> a number.
> Thanks again.
> 
> Wanli
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Sunday, June 19, 2005 5:54 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: RE: unique value count in several variables
> 
> Scott's program does not claim to subdivide by your key and 
> year and it does
> not do so. 
> 
> What you call "Nick's original program" appears to be my first code as
> modified by you. It was based on the idea that -nvals- did not exist
> beforehand, and indeed the purpose of the code is to create 
> -nvals-. In your
> case, you appear to have used it after creating -nvals- in 
> some other way.
> That won't work. At a minimum, you need to drop -nvals- first. 
> It is possible also that complications you didn't tell us 
> about have not
> been taken into account in modifying the code, as you are here using
> variable names not previously explained. 
> 
> Naturally, people often simplify their problem for Statalist 
> to show the
> essence of it. That's great for the people who answer the questions. 
> However, the original posters then need to add back the 
> complications in
> exactly the right way. 
> 
> Otherwise put, there is nothing in this report that looks to 
> me like a bug
> in Scott's code or mine given the original example you specified. 
> 
> You are right that the second approach will be slower than the first.
> There's a lot of looping and testing -if-. 
> 
> Nick
> n.j.cox@durham.ac.uk 
> 
> Wanli Zhao
>  
> > I feel I need to report on my running for people interested. 
> > I have a large
> > panel, about 1600 cross-section and 11 years. Scott's program 
> > generates nvals variable with a single value 1005 ( I do 
> not know what 
> > it means) for all the gvkey-year. Nick's modification seems 
> to work. 
> > The problem is the time is unacceptable. I broke the 
> program and the 
> > values seem correct for finished part.
> > Nick's original "reshape" program also gave me an error message as 
> > follows:
> > [reshape error
> > (note: j = ssic1 ssic2)
> > i (gvkey year sid) indicates the top-level grouping such as subject 
> > id.
> > j (_j) indicates the subgrouping such as time.
> > xij variable is K.
> > Thus, the following variable(s) should be constant within i:
> >       nvals
> > nvals not constant within i (gvkey year sid) for 28662 values of i:]
> > 
> > I guess the problem is that my ssic1 and ssic2 have many missing 
> > values.
> > Thanks.
> > 
> > Wanli Zhao
> > 
> > 
> > -----Original Message-----
> > From: owner-statalist@hsphsun2.harvard.edu
> > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> > Sent: Sunday, June 19, 2005 8:06 AM
> > To: statalist@hsphsun2.harvard.edu
> > Subject: st: RE: RE: RE: RE: unique value count in several variables
> > 
> > Please remove the "gen" from the last line of the loop. 
> > 
> > Nick
> > n.j.cox@durham.ac.uk
> > 
> > > -----Original Message-----
> > > From: owner-statalist@hsphsun2.harvard.edu
> > > [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Nick Cox
> > > Sent: 19 June 2005 12:37
> > > To: statalist@hsphsun2.harvard.edu
> > > Subject: st: RE: RE: RE: unique value count in several variables
> > > 
> > > 
> > > I too am fond of -levelsof-. For the problem mentioned, 
> this would 
> > > need to be embedded in a loop over groups, somewhat as follows:
> > > 
> > > gen nvals = . 
> > > egen group = group(Gvkey year)
> > > su group, meanonly
> > > qui forval i = 1/`r(max)' { 
> > > 	levelsof psic if group == `i', local(p) 
> > > 	levelsof ssic if group == `i', local(s)
> > > 	local total: list s | p
> > > 	local total:list uniq total
> > > 	local count:list sizeof total
> > > 	replace gen nvals = `count' if group == `i' 
> > > }
> > > 
> > > Nick
> > > n.j.cox@durham.ac.uk
> > > 
> > > > -----Original Message-----
> > > > From: owner-statalist@hsphsun2.harvard.edu
> > > > [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Scott 
> > > > Merryman
> > > > Sent: 19 June 2005 12:30
> > > > To: statalist@hsphsun2.harvard.edu
> > > > Subject: st: RE: RE: unique value count in several variables
> > > > 
> > > > 
> > > > In addition to Nick's suggestion of using -reshape-, another 
> > > > possibility is to use -levelsof- and the macro extended 
> functions 
> > > > (assuming your cross sections are not too large):
> > > > 
> > > > 
> > > > . l, noobs
> > > > 
> > > >   +------------------------------------+
> > > >   | gvkey   psic   ssic   year   subno |
> > > >   |------------------------------------|
> > > >   |  1223   4767   4743   1999       1 |
> > > >   |  1223   4767   4763   1999       2 |
> > > >   |  1223   4757   4767   1999       3 |
> > > >   |  1223   4767   4753   1999       4 |
> > > >   |  1223   4777   4787   1999       5 |
> > > >   |------------------------------------|
> > > >   |  1223   4767   4743   1999       6 |
> > > >   +------------------------------------+
> > > > 
> > > > . levelsof psic, local(p)
> > > > 4757 4767 4777
> > > > 
> > > > . levelsof ssic, local(s)
> > > > 4743 4753 4763 4767 4787
> > > > 
> > > > . local total: list s | p
> > > > 
> > > > . local total:list uniq total
> > > > 
> > > > . local count:list sizeof total
> > > > 
> > > > . gen nvals = `count'
> > > > 
> > > > . l, noobs
> > > > 
> > > >   +--------------------------------------------+
> > > >   | gvkey   psic   ssic   year   subno   nvals |
> > > >   |--------------------------------------------|
> > > >   |  1223   4767   4743   1999       1       7 |
> > > >   |  1223   4767   4763   1999       2       7 |
> > > >   |  1223   4757   4767   1999       3       7 |
> > > >   |  1223   4767   4753   1999       4       7 |
> > > >   |  1223   4777   4787   1999       5       7 |
> > > >   |--------------------------------------------|
> > > >   |  1223   4767   4743   1999       6       7 |
> > > >   +--------------------------------------------+
> > > > 
> > > > 
> > > > Scott
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- 
> > > > > statalist@hsphsun2.harvard.edu] On Behalf Of Wanli Zhao
> > > > > Sent: Saturday, June 18, 2005 3:17 PM
> > > > > To: statalist@hsphsun2.harvard.edu
> > > > > Subject: st: RE: unique value count in several variables
> > > > > 
> > > > > Thanks, Nick. I looked into the suggestions and I think I
> > > might have
> > > > > confused you on my problem. My panel data is like this:
> > > > > Gvkey  psic  ssic  year  subno
> > > > > 1223   4767  4743  1999  1
> > > > > 1223   4767  4763  1999  2
> > > > > 1223   4757  4767  1999  3
> > > > > 1223   4767  4753  1999  4
> > > > > 1223   4777  4787  1999  5
> > > > > 1223   4767  4743  1999  6
> > > > > 
> > > > > Using command unique, I can count the distinct values of
> > > > psic and ssic by
> > > > > gvkey by year. So for psic it's 3 and for ssic it's 5. what
> > > > I want is to
> > > > > count the distinct values of both psic and ssic by gvkey by
> > > > year. In this
> > > > > case, it's 7 (4767, 4757, 4777, 4743, 4763, 4753, 4787). 
> > > > How to generate a
> > > > > new variable for my purpose? Hope I'm clear now. Pls help.
> > > > > 
> > > > > Thanks.
> > > > > Wanli Zhao
> > > > > 
> > > > 
> > > > 
> > > > *
> > > > *   For searches and help try:
> > > > *   http://www.stata.com/support/faqs/res/findit.html
> > > > *   http://www.stata.com/support/statalist/faq
> > > > *   http://www.ats.ucla.edu/stat/stata/
> > > > 
> > > 
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > > 
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> > 
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> > 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index