Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Data Management


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Data Management
Date   Fri, 21 Nov 2008 18:11:34 -0000

Your notification of your earlier mistake removes my puzzlement. 

In terms of your puzzlement: 

by id : list prev city if prev != city & _n == 2

does not specify pertinent observations under each -id- if none exist,
but the -by:- obliges -list- to give each heading. You can get more
concise output in the way you specified. 

Nick 
[email protected] 

Rijo John

That is intersting.

When I type the command as you originally wrote

by id : gen prev = city[1]
by id : list prev city if prev != city & _n == 2

the output is only green lines with id=1, id=2, id=3 and so on under
each green line.. nothing else at all on the output page.

Whereas, when I gave the command

by id : gen prev = city[1]
list prev city if  city!=prev

it listed what I want and was similar to the the result I would get
using duplicates command.

In my previous mail wrongly said I only changed "by id" portion from
your command. Instead I omited "& _n == 2" portion too from your
second command to get the result.

Thanks,
Rijo.

On Fri, Nov 21, 2008 at 11:48 AM, Nick Cox <[email protected]> wrote:
> I am confused. You changed my code, which you should do if it is
wrong.
> But I don't think it is wrong.
>
> Also, if the second command is exactly what you typed, it should
-list-
> at most one observation. _n == 2 is true for the whole dataset just
> once, and the compound statement will be true either never or once.
>
> There is some misinformation somewhere.
>
> Nick
> [email protected]
>
> Rijo John
>
> Thanks Nick.
>
>  When I wrote the same command
>
> by id : gen prev = city[1]
> list prev city if prev != city & _n == 2
>
> it gave me the the solution.
>
> If I use "by id" again with the second command it would not list what
I
> want.
>
> Thanks,
> Rijo.
>
> On Fri, Nov 21, 2008 at 11:35 AM, Rijo John <[email protected]> wrote:
>> Hi Nick,
>>
>>  I will read more into the tip you gave.. When I gave the command you
> suggested
>>
>> by id : gen prev = city[1]
>> by id : list prev city if prev != city & _n == 2
>>
>> it just lists all the ids... one by one... Doesn't solve the problem.
>>
>> Thanks.
>> Rijo.
>>
>>
>>
>> On Fri, Nov 21, 2008 at 11:26 AM, Nick Cox <[email protected]>
> wrote:
>>> This synthetic example shows that the command will list precisely
> those
>>> observations that differ from the previous observation. But this
>>> includes the first, as city[0] evaluates to string missing, i.e. "".
>>> More generally, varname[0] is regarded as missing in the sense of
the
>>> variable's data type, i.e. numeric missing . or string missing "".
So
>>> the first in each group will always be listed (unless its value is
>>> missing).
>>>
>>> . l
>>>
>>>     +------------+
>>>     |       city |
>>>     |------------|
>>>  1. | Durham, UK |
>>>  2. | Durham, UK |
>>>  3. | Durham, UK |
>>>  4. | Durham, NC |
>>>  5. | Durham, NC |
>>>     |------------|
>>>  6. | Durham, NH |
>>>  7. | Durham, NH |
>>>  8. | Durham, NH |
>>>  9. | Durham, NH |
>>>  10. | Durham, NH |
>>>     +------------+
>>>
>>> . list if city != city[_n-1]
>>>
>>>     +------------+
>>>     |       city |
>>>     |------------|
>>>  1. | Durham, UK |
>>>  4. | Durham, NC |
>>>  6. | Durham, NH |
>>>     +------------+
>>>
>>> You probably want
>>>
>>> by id : gen prev = city[1]
>>> by id : list prev city if prev != city & _n == 2
>>>
>>> There is no royal road to cleaning up string variables. The matter
> was
>>> discussed on the list earlier this year and written up as a Tip:
>>>
>>> SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string
>>> variables
>>>        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin
and
>>> E. Poen
>>>        Q3/08   SJ 8(3):444--445                                 (no
>>> commands)
>>>        tip on how to clean up user-entered string variables
>>>
>>> Nick
>>> [email protected]
>>>
>>> Rijo John
>>>
>>> I have a data set as follows
>>>
>>> ID  City          Year
>>> 1    City name   1
>>> 1    City name   2
>>>
>>>
>>> The data is suppose to have same city names for each ids for year 1
>>> and two. but there are many occasions where city for the year 1 is
>>> spelt differently thanthat for year 2. I just want to list out or
> edit
>>> those cities where city names are different for year 1 and 2 for the
>>> same ID. When I issue the following command
>>>
>>> bysort ID : list if  City!=City[_n-1]
>>>
>>> it lists all observations in the data whether or not the city is
> spelt
>>> differently in years one and two. Thats strange to me? Can someone
>>> tell what  I am doing wrong here?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index