Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Data Management


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Data Management
Date   Fri, 21 Nov 2008 17:26:00 -0000

This synthetic example shows that the command will list precisely those
observations that differ from the previous observation. But this
includes the first, as city[0] evaluates to string missing, i.e. "".
More generally, varname[0] is regarded as missing in the sense of the
variable's data type, i.e. numeric missing . or string missing "". So
the first in each group will always be listed (unless its value is
missing). 

. l

     +------------+
     |       city |
     |------------|
  1. | Durham, UK |
  2. | Durham, UK |
  3. | Durham, UK |
  4. | Durham, NC |
  5. | Durham, NC |
     |------------|
  6. | Durham, NH |
  7. | Durham, NH |
  8. | Durham, NH |
  9. | Durham, NH |
 10. | Durham, NH |
     +------------+

. list if city != city[_n-1]

     +------------+
     |       city |
     |------------|
  1. | Durham, UK |
  4. | Durham, NC |
  6. | Durham, NH |
     +------------+

You probably want 

by id : gen prev = city[1] 
by id : list prev city if prev != city & _n == 2 

There is no royal road to cleaning up string variables. The matter was
discussed on the list earlier this year and written up as a Tip: 

SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string
variables
        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and
E. Poen
        Q3/08   SJ 8(3):444--445                                 (no
commands)
        tip on how to clean up user-entered string variables

Nick 
[email protected] 

Rijo John

I have a data set as follows

ID  City          Year
1    City name   1
1    City name   2


The data is suppose to have same city names for each ids for year 1
and two. but there are many occasions where city for the year 1 is
spelt differently thanthat for year 2. I just want to list out or edit
those cities where city names are different for year 1 and 2 for the
same ID. When I issue the following command

bysort ID : list if  City!=City[_n-1]

it lists all observations in the data whether or not the city is spelt
differently in years one and two. Thats strange to me? Can someone
tell what  I am doing wrong here?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index