Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: complex (convoluted?) expansion of data


From   "Clint Thompson" <Clint.Thompson@hsc.utah.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: complex (convoluted?) expansion of data
Date   Tue, 05 Oct 2004 18:06:33 -0600

Hello All ----
I am using Intercooled, v.8.2.
I have a dataset that I need to expand, albeit there are some nuances
in the data that preclude a simple execution of -expand-.  
The data is currently a combination of "wide" and "long", that is, I
have one record (line) allocated to each subject if he/she did not
experience any events, but if the subject did report any events then
there are as many lines allocated for each event reported.  For
instance, I have one subject that reported 600 procedures (not events)
so this subject is only allocated one line but I have multiple subjects
that reported multiple procedures with multiple events and thus occupy
multiple lines --- one for each event reported.  My objective is to
-expand- the dataset by procedures (which I can do using -expand-),
although I need to address how the procedures are distributed within
each subject.  More to the point, each subject had to indicate the
percentage of time he/she employed a particular type of surgical machine
(as well as machine setting), and the percentage of time he/she employed
a particular surgical approach.  Ultimately, I want a "long" dataset
that lists all the procedures for each subject and incorporates the
percentage of time that the subject employed a particular surgical
approach, machine, & machine setting.  I've pasted a few observations
from my dataset in its current form for illustration:

     
+---------------------------------------------------------------------------------------------+
       |             name   surger~s   div_con     machine   percent   
 setting   wound~er |
      
|------------------------------------------------------------------------------------
            |
 48. |             John Doe     210       100        leg_a       100   
       pul_a              .        |
 49. |          Jane Doe        300       100        sov_a      100    
  white_a             2        |
 50. |          Jane Doe        .         .                     .      
               .                                  |
    
+----------------------------------------------------------------------------------------------+

where 'name' is subject name, 'surger~s' is the number of procedures
reported by the subject, 'div_con' indicates percent of time a surgical
approach was used (I have three of these variables that I did not list
for brevity), 'machine' is the surgical machine used with the respective
percent of time listed in 'percent', and 'setting' is the corresponding
machine setting to surgical machine.  The last variable, 'wound~er',
indicates whether the subject had any events and if so, their name is
repeated with each line reporting the relevant information for each
event (e.g. machine & setting used, surgical approach used).     
What I'm envisioning is a dataset listing name, machine type, machine
setting, & surgical approach wherein each subject has as many lines as
procedures (surgeries) along with a variable loosely referred to as
'event' that assumes the value one if the attributes of the event mirror
the attributes of the procedure.  
I recognize that this problem (and explanation!) are somewhat
convoluted, nevertheless, I appreciate any and all suggestions.  
Much obliged,
Clint Thompson

  



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index