Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: multiple observations - collapsing a string variable?


From   Chris Wallace <[email protected]>
To   [email protected]
Subject   st: multiple observations - collapsing a string variable?
Date   Fri, 10 Dec 2004 11:47:03 +0000

Short question:

I would like to be able to -collapse- some data concatenating multiple
values of a string variable, rather than taking some function of a
numeric variable.  Is this at all possible?


Longer explanation of why:

I am using a biochemical dataset where each individual has had a blood
sample analysed for multiple biochemical variables.  Each individual
may has multiple records, each identified by a different labnumber
(labno), as different variables are measured at different times.  So a
typical record might be (where x represents a non-missing and . a
missing value):

individual_id labno  potassium  sodium  hdl cholesterol
00001         78882  x          x       .   .
00001         72010  .          .       x   x

I want to collapse the data so I have a single record:

individual_id potassium  sodium  hdl cholesterol
00001         x          x       x   x

This is fine with "typical" records, as above.  But sometimes the same
test may have been performed twice (either (a) as a check that the lab
was working ok, or (b) because something seemed wrong about one
analysis), so there'll be a record like:

individual_id labno  potassium  sodium  hdl cholesterol
00002         78882  x          x       .   .
00002         72010  .          .       x   x
00002         82901  .          .       y   y

where the hdl and cholesterol levels differ.  I want to list all such
records to distinguish between cases (a) and (b).  I stored the names
of all (25) biochemical variables in the local macro allvars and did:


log using biochem-multiples, replace
qui {
  foreach v of local allvars {
  preserve
  keep individual_id `v'
  collapse (min) mn=`v' (max) mx=`v' , by(individual_id)
  noi di "!     `v'"
  noi l ind mn mx if mn!=mx
  restore
}
}
log close
view biochem-multiples.smcl

which lists all the individuals with multiple non-null, differing
records for any covar.  But I would like to also see the labno's for
these records.  I can manually scroll through the log file typing -di
if ...-, but would like to automate it.  I guess what I would love is
to be able to replace the collapse like above with

keep individual_id labno `v'
collapse (min) mn=`v' (max) mx=`v' (concatenate, punct("/")) labno=labno, by(individual_id)

so that, for individual 00002 I would have in the logfile

00002 x y 78882/72010/82901

Even better would be

00002 x y 72010/82901

(listing the labno's only for the observations that are problems).
But I have no idea how to do this.  Can anyone help - perhaps with an
alternative route altogether?

Many thanks, Chris.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index