Hi

I have 50 individual-specific datasets (ie only one subject ID per dataset), all of which contain data on the same 20 variables. The original datasets were ASCII text and I read them into Stata, creating 50 datasets and ran basic descriptive stats for each of the 50 subjects.

I then appended all 50 datasets into 1 very large dataset and discovered that some of the subjects now have the wrong number of observations! Some have too many observations and some have too few observations although the total number of observations for all 50 subjects is correct.

Specifically:

. table subjid

----------------------

subjid | Freq.

----------+-----------

28722 | 19,296 (should have 12,299 obs)

50910 | 23,971 (should have 23,972 obs)

54476 | 69,222

87734 | 21,213

119669 | 59,245 (should have 59,244 obs)

123614 | 6,634

127871 | 419

130008 | 51,515

130722 | 2,155

162245 | 59,448

194574 | 7,872

209171 | 7,991 (should have 7,989 obs)

226711 | 1,417

228761 | 2,652

284310 | 1,657

323652 | 2,267 (should have 2,269 obs)

326175 | 21,870

328958 | 30,081

360402 | 7,260

370576 | 15,429

371133 | 913

407487 | 5,820

413293 | 1,645

415301 | 39,116

417756 | 9,418

459852 | 14,024 (should have 14,023 obs)

462509 | 2,544

475134 | 567

476368 | 35,533

484595 | 4,792

487508 | 61,457

507428 | 18,677

564155 | 13,084

577895 | 31,010

580566 | 1,745

598037 | 9,369

666481 | 16,679 (should have 16,678 obs)

677056 | 22,647

717037 | 19,085

751384 | 27,639

763586 | 9,999

788300 | 32,728

828191 | 13,339

836796 | 11,495 (should have 11,494 obs)

876142 | 2,921

917942 | 9,432

929316 | 10,659

943493 | 14,256

955867 | 31,104

968002 | 909

----------------------

My code:

use user_usage.X1.28722.dta, clear

describe

foreach subj of numlist 50910 54476 87734 119669 123614 /*

*/ 127871 130008 130722 162245 194574 209171 226711 228761 284310 323652 /*

*/ 326175 328958 360402 370576 371133 407487 413293 415301 417756 459852 /*

*/ 462509 475134 476368 484595 487508 507428 564155 577895 580566 598037 /*

*/ 666481 677056 717037 /*

*/ 751384 763586 788300 828191 836796 876142 /*

*/ 917942 929316 943493 955867 968002 {

append using user_usage.X1.`subj'.dta

capture noisily save user_usage.X1.merge.dta, replace

}

compress

sort subjid startedate start_hr start_min start_sec /*

*/ endedate end_hr end_min end_sec

save user_usage.X1.merge.dta, replace

describe

table subjid

.........

Because the number of observations is so large for many of these subjects, I'm not sure how to go about looking to see which observations got dropped etc.

I am using StataSE 7.0 on a Windows 2000 machine with 384mb RAM

the StataSE 7.0 executable is dated 11 Jun 2002

and the ado files are dated 9 Aug 2002

Thank you for any help that you can provide.

Karyen

