Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: wrong number of observations after append


From   Karyen Chu <k_chu@uclink.berkeley.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: wrong number of observations after append
Date   Mon, 02 Sep 2002 15:25:28 -0700

Hi

I have 50 individual-specific datasets (ie only one subject ID per dataset), all of which contain data on the same 20 variables. The original datasets were ASCII text and I read them into Stata, creating 50 datasets and ran basic descriptive stats for each of the 50 subjects.

I then appended all 50 datasets into 1 very large dataset and discovered that some of the subjects now have the wrong number of observations! Some have too many observations and some have too few observations although the total number of observations for all 50 subjects is correct.

Specifically:

. table subjid

----------------------
subjid | Freq.
----------+-----------
28722 | 19,296 (should have 12,299 obs)
50910 | 23,971 (should have 23,972 obs)
54476 | 69,222
87734 | 21,213
119669 | 59,245 (should have 59,244 obs)
123614 | 6,634
127871 | 419
130008 | 51,515
130722 | 2,155
162245 | 59,448
194574 | 7,872
209171 | 7,991 (should have 7,989 obs)
226711 | 1,417
228761 | 2,652
284310 | 1,657
323652 | 2,267 (should have 2,269 obs)
326175 | 21,870
328958 | 30,081
360402 | 7,260
370576 | 15,429
371133 | 913
407487 | 5,820
413293 | 1,645
415301 | 39,116
417756 | 9,418
459852 | 14,024 (should have 14,023 obs)
462509 | 2,544
475134 | 567
476368 | 35,533
484595 | 4,792
487508 | 61,457
507428 | 18,677
564155 | 13,084
577895 | 31,010
580566 | 1,745
598037 | 9,369
666481 | 16,679 (should have 16,678 obs)
677056 | 22,647
717037 | 19,085
751384 | 27,639
763586 | 9,999
788300 | 32,728
828191 | 13,339
836796 | 11,495 (should have 11,494 obs)
876142 | 2,921
917942 | 9,432
929316 | 10,659
943493 | 14,256
955867 | 31,104
968002 | 909
----------------------



My code:

use user_usage.X1.28722.dta, clear
describe

foreach subj of numlist 50910 54476 87734 119669 123614 /*
*/ 127871 130008 130722 162245 194574 209171 226711 228761 284310 323652 /*
*/ 326175 328958 360402 370576 371133 407487 413293 415301 417756 459852 /*
*/ 462509 475134 476368 484595 487508 507428 564155 577895 580566 598037 /*
*/ 666481 677056 717037 /*
*/ 751384 763586 788300 828191 836796 876142 /*
*/ 917942 929316 943493 955867 968002 {

append using user_usage.X1.`subj'.dta

capture noisily save user_usage.X1.merge.dta, replace

}

compress

sort subjid startedate start_hr start_min start_sec /*
*/ endedate end_hr end_min end_sec

save user_usage.X1.merge.dta, replace

describe

table subjid

.........


Because the number of observations is so large for many of these subjects, I'm not sure how to go about looking to see which observations got dropped etc.

I am using StataSE 7.0 on a Windows 2000 machine with 384mb RAM
the StataSE 7.0 executable is dated 11 Jun 2002
and the ado files are dated 9 Aug 2002


Thank you for any help that you can provide.


Karyen





*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index