Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steve Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: AW: RE: AW: Identifying unique values with codebook |
Date | Wed, 16 Jun 2010 10:39:26 -0400 |
I agree. I was incorrect in stating that --inspect- and -summarize- treat strings alike. -summarize- states that there are no observations on a string variable. -inspect- does count the number of observations, but then reports nonsense. Steve On Wed, Jun 16, 2010 at 10:31 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > -inspect- has its fans, but I am not among them. > > -inspect- seems to regard strings as failed numerics, and refuses to > look inside. It would be simpler to ignore all strings. > > I wonder whether some kind of review of -inspect- is called for. > Whatever -inspect- does that differs from -codebook- could be taken into > -codebook-, perhaps. -inspect- could remain available under version > control. > > Nick > n.j.cox@durham.ac.uk > > Martin Weiss > > A "string representation" seems to work with -codebook-, but not > -inspect- > which claims to find o unique values... > > > ************* > clear* > inp str25 mystrvar > loooooooooooooooooooooooooooonnggg > verylooooooooooooooooooooonnnnnnnnngggg > end > ins > codebook > ************* > > Nick Cox > > As Martin says, at root this is a precision problem. > > Neither -codebook- nor anything else is to blame if it is presented with > the same values. To hold very large integers you may need to consider > -long- as another possibility, or even a string representation. > > Martin Weiss > > As -help data_types- says: "doubles have 16 digits of accuracy." So you > can > increase the digits of your "y" up to the point where even -double- can > do > nothing for you: > > ************* > clear > set obs 10 > gen byte x=_n > codebook x > gen double y1 = 1000000000000000 + x > gen double y2 = 10000000000000000 + x > gen double y3 = 100000000000000000 + x > gen double y4 = 1000000000000000000 + x > codebook y? > ins y? > ************* > > > Interestingly, -inspect- seems to differ from -codebook-`s opinion. > > Walter Garcia-Fontes > > I stumped into a problem when identifying unique values of a numeric > variable using "codebook": if the values are large they will be > identified as the same value. > > For instance I have a variable x with the following values: > 0, 1, 2, ... 10 (that is 10 different values) > > codebook x > reports "unique values: 10" > > Now do > gen y = 100000000000000000 + x > > codebook y > reports "unique values = 1" > > Is this a feature? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/