Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable

From	Claude Beaty <[email protected]>
To	"<[email protected]>" <[email protected]>
Subject	Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
Date	Wed, 13 Jun 2012 01:10:50 +0000

Thanks. That sounds like good advice. 

Claude Beaty
Sent from my iPhone

On Jun 12, 2012, at 9:03 PM, "Sarah Edgington" <[email protected]> wrote:

> Claude,
> One thing you haven't mentioned, I don't think, is whether you have any
> duplicate observations per person in the set that you are trying to merge on
> to the visit data.  If you have multiple visits for each ID in your master
> data set but the using dataset has only one record per ID you can simply do
> a m:1 merge and you shouldn't have any problems.  If your other file has
> multiple records per ID, then your problem is more complicated and merging
> the files as-is probably is not a very good idea at all.
> 
> Nick is right that the correct merge should not create duplicates.  There
> are a number of ways to confirm this for yourself without having to
> -reshape- the data to wide form.
> For me the best place to start is by looking carefully at the created _merge
> variable.  Are there cases that didn't match?  Did you expect that?  If not,
> that bears investigating.
> 
> Next, look at the overall number of observations.  First, count how many
> observations are in the master dataset in long form (that is, the data with
> ID codes and multiple visits per ID).  Then, if you do a many to one merge
> using your second data set you should find that [original observations] =
> [number matched] + [number in master only].  If that isn't the case,
> something is likely wrong.
> 
> Finally, if you're still worried and want to be sure that you have the exact
> same records in your merged data as you did before the merge, try looking at
> the means of some important variables from the master file before and after
> the merge.  If your ID field is a numeric variable (though it's often best
> if it isn't) then you can look at the N and mean of that variable before and
> after the merge too.  If the distribution of variables from the master file
> remains the same before and after the merge then you have some pretty good
> evidence that you have not somehow introduced extra records.  (This assumes
> that all the data in your master file matches a record in the using file; if
> this isn't the case go back to the first step and make sure you understand
> why).
> 
> I know merging sometimes seems complicated, but as long as you pay very
> close attention to the details of the output and make sure you understand
> why some IDs matched and some didn't, it's generally going to be ok.  Unless
> you're doing a many to many merge.  Then it's complicated and, in nearly all
> cases, the wrong approach entirely.
> 
> Hope that helps.
> 
> -Sarah
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Tuesday, June 12, 2012 5:29 PM
> To: [email protected]
> Subject: Re: st: RE: RE: RE: RE: Combining multiple observations by an ID
> variable
> 
> Your original data structure strikes me as far better for the majority of
> purposes for which it might be used within Stata. Whether -reshape
> wide- is possible is thus secondary. It is almost certainly not a good idea.
> 
> Incidentally, -reshape- is a command, not a function. Also, I see no reason
> why the correct -merge- command should create extra observations as you
> imply here.
> 
> Nick
> 
> On Tue, Jun 12, 2012 at 11:31 PM, Claude Beaty <[email protected]> wrote:
> 
>> Reshape was something I considered as well. Unfortunately, every time I
> attempt to run this code I get the error "too many macros". I have stata 12,
> which I believe is the most updated version. If anyone knows of a way around
> this, please let me know.
> 
> Swanquist, Quinn Thomas
> 
>> Fair enough,
>> 
>> If you need the observations to equal the number of visits and you need to
> keep the data from each visit, you are going to need to use the reshape wide
> function on the master dataset before the merge. Since you said that you
> have 70 variables for each visit, you will now have 70 * the max number of
> visits variables. Depending on your version of Stata you may or may not be
> able to work with that many variables.
>> 
>> You can get help with this function using:
>> 
>> help reshape
> 
> Claude Beaty
> 
>> It looks like the merger attempt was likely successful, though I'm sure
> there are some duplicates. However, your suggested code did not help to
> shift the data so that the total observations equal the number of ID codes
> instead of the number of visits. I have tried reshaping etc, but there are
> too many macros to reshape all of the variables. Is there another way? If I
> can arrange the data in this way, it is easier to compare with my previous
> file and find duplicate ID codes. As it stands now, it is difficult to tell
> if duplicate ID codes are due to successive visits or duplications created
> by the file merger.
> 
> Swanquist, Quinn Thomas
> 
>> Do you have an identifier for visit number (if not you could use date).
>> 
>> Sort as follows:
>> 
>> sort IDcode visit
>> 
>> then merge many to one as follows:
>> 
>> merge m:1 IDcode using "usingfile"
> 
> Claude Beaty
> 
>> I have a large dataset of observations in which individuals (~40,000 
>> ID codes) were evaluated multiple times (5-10 visit numbers per 
>> individual) on over 70 variables. However, the data has been arranged 
>> so that each visit number is an observation, instead of each 
>> individual ID code as an observation. I need to merge this file with 
>> another file sorted by individual ID codes. How do I rearrange this 
>> data so that it is arranged by ID codes with consecutive follow up 
>> visits? Thanks
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Combining multiple observations by an ID variable
  - From: Claude Beaty <[email protected]>
- st: RE: Combining multiple observations by an ID variable
  - From: "Swanquist, Quinn Thomas" <[email protected]>
- st: RE: RE: Combining multiple observations by an ID variable
  - From: Claude Beaty <[email protected]>
- st: RE: RE: RE: Combining multiple observations by an ID variable
  - From: "Swanquist, Quinn Thomas" <[email protected]>
- st: RE: RE: RE: RE: Combining multiple observations by an ID variable
  - From: Claude Beaty <[email protected]>
- Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
  - From: Nick Cox <[email protected]>
- RE: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
  - From: "Sarah Edgington" <[email protected]>

Prev by Date: Re: st: getting Stata to read a bizarre sequence of dates
Next by Date: Re: st: Alternative to coefficient of variation (CV)/relative standard error as a measure of estimate's reliability/stability
Previous by thread: RE: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
Next by thread: RE: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
Index(es):
- Date
- Thread