Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: cleaning a specific data structure

From   "Nick Cox" <>
To   <>
Subject   st: RE: cleaning a specific data structure
Date   Fri, 21 Nov 2003 13:06:59 -0000

Radu Ban 

> The data is organized like this, numbers are made-up for this
> id dummy descriptor
> 13 1 <blank>
> 13 0 abc
> 13 1 <blank>
> 14 0 <blank>
> 14 0 def
> 14 0 def
> The idea is that the id variable should be unique, but for some
> reason it is not.  This means that both the dummy and descriptor
> should have the same values accross the id groups. A complication
> is that for the dummy, if there's a "1" in a group all the group
> should be "1". 
> I want to reduce this to a clean version which looks like this:
> id dummy descriptor
> 13 1 abc
> 14 0 def
> For the dummy part I dealt with it like this (probably a convoluted
> bysort id: egen maxdummy = max(dummy)
> replace dummy = maxdummy
> bysort id: keep if _n == 1
> But I am a bit stuck on how to deal with the string descriptor. I
> mean I know one way of doing by splitting the data and then
> merging it back but there has to be a more efficient way.

I think you are right: you can do all you want in one place. 

The dummy can be sorted out your way, or this way: 

bysort id (dummy) : replace dummy = dummy[_N] 

as 1s will get sorted to the end. 

If I understand correctly, the descriptor can be 
sorted out similarly 

bysort id (descriptor) : replace descriptor = descriptor[_N] 

as the empty strings will get sorted to the beginning. 

However, before you do that you should test the 
assumption that all (non-empty) descriptors are 
identical within -id-: 

gen empty = mi(descriptor) 
bysort id empty (descriptor) : 
	assert descriptor[1] == descriptor[_N]  

On the last, see also

<<attachment: winmail.dat>>

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index