Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: bysort problem


From   "Nikolaos A. Patsopoulos" <npatsop@cc.uoi.gr>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: bysort problem
Date   Mon, 26 Feb 2007 18:01:51 +0200

Sergiy Radyakin wrote:

Hi Nikolaos!

I guess this code works:
***--------------------------------------------------------------------------------------
clear
set more off

input var1 var2
.145 .14
.145 .15
.145 .15
.167 .15
1.89 .15
1.89 .16
1.89 .16
end


list

tempvar _id
local _idN=2
while `_idN'!=1 {
di "------------------------------"
qui gen `_id'=0
list
bysort var1 var2: replace `_id'=_n
replace var1=var1+(`_id'-1)*0.001 if `_id'>1
sum `_id', detail
local _idN=r(max)
list
drop `_id'
di "=============================="
}

***--------------------------------------------------------------------------------------



However I do not understand why do you write "The obs in 3 and 6 should have _id (__000002) 0.016 and ) 0.017,
respectively." You change var1, not the temporary id variable. Why do you expect _id==0.016?

Have you considered a possibility that adding 0.001 might assign your observation to a different group (defined by a pair of your var1;var2 variables?) Or is it exactly the desired behaviour? Imagine var2=const for all observations. You have var1 for obs 1 to 1000 equal to 0.001 to 1. And you have one more observation with var1=0.001. This procedure will add 0.001 1000 times moving this observation all the way to 1.001.


Regards,
Sergiy














----- Original Message ----- From: "Nikolaos A. Patsopoulos" <npatsop@cc.uoi.gr>
To: <statalist@hsphsun2.harvard.edu>
Sent: Monday, February 26, 2007 8:50 AM
Subject: st: bysort problem



I'm currently writing a program that in some point checks if more than observations have two vars (E and SE) equal. If more than one exists then SE is increased by 0.001:

tempvar _id
qui gen `_id'=0
local _idN=2
while `_idN'!=1 {
bysort `E' `SE': replace `_id'=_n if `touse'
count if `_id'>1 & `touse'
replace `SE'=`SE'+(`_id'-1)*0.001 if `_id'>1
sum `_id' if `touse', detail
local _idN=r(max)
list `E' `SE' `_id' if `touse'
}

when I run the above piece of code bysort fails in the second pass:

2
(2 real changes made)

__000002
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 7
25% 1 1 Sum of Wgt. 7

50% 1 Mean 1.285714
Largest Std. Dev. .48795
75% 2 1
90% 2 1 Variance .2380952
95% 2 2 Skewness .9486833
99% 2 2 Kurtosis 1.9

+------------------------+
| var1 var2 __000002 |
|------------------------|
1. | .145 .014 1 |
2. | .145 .015 2 |
3. | .145 .015 1 |
4. | .167 .015 1 |
5. | 1.89 .015 1 |
|------------------------|
6. | 1.89 .016 2 |
7. | 1.89 .016 1 |
+------------------------+
0
(0 real changes made)

__000002
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 7
25% 1 1 Sum of Wgt. 7

50% 1 Mean 1
Largest Std. Dev. 0
75% 1 1
90% 1 1 Variance 0
95% 1 1 Skewness .
99% 1 1 Kurtosis .

+------------------------+
| var1 var2 __000002 |
|------------------------|
1. | .145 .014 1 |
2. | .145 .015 1 |
3. | .145 .015 1 |
4. | .167 .015 1 |
5. | 1.89 .015 1 |
|------------------------|
6. | 1.89 .016 1 |
7. | 1.89 .016 1 |
+------------------------+

The obs in 3 and 6 should have _id (__000002) 0.016 and ) 0.017, respectively.

What do I miss?

Another sort question:

How can I label tempvars and locals?

Thanks in advance,

Nikos

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

The purpose of the code is to eliminate duplicate observations of var1 & var2 (combination).
In the firest pass the algorithm fixes duplicates but new ones might come-up, so it should run as long as noone is left. The correction is too small for the real data (the ones here are dummy ones just to test the code) and the possibility of duplicates very small but still present.

var2 is changed not var1. This was a mistake I made on earlier e-mail.


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index