Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Re: bysort problem

From   "Nick Cox" <>
To   <>
Subject   RE: st: Re: bysort problem
Date   Mon, 26 Feb 2007 17:58:34 -0000

This thread is extraordinarily frustrating. 
I still am not clear on what is desired and
on what is seen to be a problem. 

Nikolaos stated at one point that he wanted
to eliminate duplicates. If this means -drop-
them from the data, then -duplicates drop- 
is available in Stata, although writing your own code 
would be instructive. 

But it seems to mean "make them different", but
adding different small constants and then adding noise
have both been seized upon as solutions. Are 
they equally attractive or appropriate? 

At the risk of complicating an already convoluted
thread, I add further comments: 

0. If `E' and `SE' are some kind of identifier, then some
coding as unique integers is likely to be optimal (and comments
below are irrelevant). 

1. Changing the data needs to be justified. 

2. Adding different constants and adding random
noise are not reproducible without further 
constraints. The first depends on sort order
and the second on seed and time. 

3. Adding even small amounts that are all positive
changes any location parameter for any variable. 

I can't encourage any of the solutions offered
without knowing that there is an answer to 1 and
that 2 and 3 don't (won't) matter. But if 2 and 3 
don't matter, why do all this in the first place? 

Whatever the precise problem, I am confident, 
with Austin Nichols, that _no_ looping should be required. 


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index