Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Claude Beaty <cbeaty1@jhmi.edu> |

To |
"<statalist@hsphsun2.harvard.edu>" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable |

Date |
Wed, 13 Jun 2012 01:10:50 +0000 |

Thanks. That sounds like good advice. Claude Beaty Sent from my iPhone On Jun 12, 2012, at 9:03 PM, "Sarah Edgington" <sedging@ucla.edu> wrote: > Claude, > One thing you haven't mentioned, I don't think, is whether you have any > duplicate observations per person in the set that you are trying to merge on > to the visit data. If you have multiple visits for each ID in your master > data set but the using dataset has only one record per ID you can simply do > a m:1 merge and you shouldn't have any problems. If your other file has > multiple records per ID, then your problem is more complicated and merging > the files as-is probably is not a very good idea at all. > > Nick is right that the correct merge should not create duplicates. There > are a number of ways to confirm this for yourself without having to > -reshape- the data to wide form. > For me the best place to start is by looking carefully at the created _merge > variable. Are there cases that didn't match? Did you expect that? If not, > that bears investigating. > > Next, look at the overall number of observations. First, count how many > observations are in the master dataset in long form (that is, the data with > ID codes and multiple visits per ID). Then, if you do a many to one merge > using your second data set you should find that [original observations] = > [number matched] + [number in master only]. If that isn't the case, > something is likely wrong. > > Finally, if you're still worried and want to be sure that you have the exact > same records in your merged data as you did before the merge, try looking at > the means of some important variables from the master file before and after > the merge. If your ID field is a numeric variable (though it's often best > if it isn't) then you can look at the N and mean of that variable before and > after the merge too. If the distribution of variables from the master file > remains the same before and after the merge then you have some pretty good > evidence that you have not somehow introduced extra records. (This assumes > that all the data in your master file matches a record in the using file; if > this isn't the case go back to the first step and make sure you understand > why). > > I know merging sometimes seems complicated, but as long as you pay very > close attention to the details of the output and make sure you understand > why some IDs matched and some didn't, it's generally going to be ok. Unless > you're doing a many to many merge. Then it's complicated and, in nearly all > cases, the wrong approach entirely. > > Hope that helps. > > -Sarah > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Tuesday, June 12, 2012 5:29 PM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: RE: RE: RE: RE: Combining multiple observations by an ID > variable > > Your original data structure strikes me as far better for the majority of > purposes for which it might be used within Stata. Whether -reshape > wide- is possible is thus secondary. It is almost certainly not a good idea. > > Incidentally, -reshape- is a command, not a function. Also, I see no reason > why the correct -merge- command should create extra observations as you > imply here. > > Nick > > On Tue, Jun 12, 2012 at 11:31 PM, Claude Beaty <cbeaty1@jhmi.edu> wrote: > >> Reshape was something I considered as well. Unfortunately, every time I > attempt to run this code I get the error "too many macros". I have stata 12, > which I believe is the most updated version. If anyone knows of a way around > this, please let me know. > > Swanquist, Quinn Thomas > >> Fair enough, >> >> If you need the observations to equal the number of visits and you need to > keep the data from each visit, you are going to need to use the reshape wide > function on the master dataset before the merge. Since you said that you > have 70 variables for each visit, you will now have 70 * the max number of > visits variables. Depending on your version of Stata you may or may not be > able to work with that many variables. >> >> You can get help with this function using: >> >> help reshape > > Claude Beaty > >> It looks like the merger attempt was likely successful, though I'm sure > there are some duplicates. However, your suggested code did not help to > shift the data so that the total observations equal the number of ID codes > instead of the number of visits. I have tried reshaping etc, but there are > too many macros to reshape all of the variables. Is there another way? If I > can arrange the data in this way, it is easier to compare with my previous > file and find duplicate ID codes. As it stands now, it is difficult to tell > if duplicate ID codes are due to successive visits or duplications created > by the file merger. > > Swanquist, Quinn Thomas > >> Do you have an identifier for visit number (if not you could use date). >> >> Sort as follows: >> >> sort IDcode visit >> >> then merge many to one as follows: >> >> merge m:1 IDcode using "usingfile" > > Claude Beaty > >> I have a large dataset of observations in which individuals (~40,000 >> ID codes) were evaluated multiple times (5-10 visit numbers per >> individual) on over 70 variables. However, the data has been arranged >> so that each visit number is an observation, instead of each >> individual ID code as an observation. I need to merge this file with >> another file sorted by individual ID codes. How do I rearrange this >> data so that it is arranged by ID codes with consecutive follow up >> visits? Thanks > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Combining multiple observations by an ID variable***From:*Claude Beaty <cbeaty1@jhmi.edu>

**st: RE: Combining multiple observations by an ID variable***From:*"Swanquist, Quinn Thomas" <qswanqui@utk.edu>

**st: RE: RE: Combining multiple observations by an ID variable***From:*Claude Beaty <cbeaty1@jhmi.edu>

**st: RE: RE: RE: Combining multiple observations by an ID variable***From:*"Swanquist, Quinn Thomas" <qswanqui@utk.edu>

**st: RE: RE: RE: RE: Combining multiple observations by an ID variable***From:*Claude Beaty <cbeaty1@jhmi.edu>

**Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: RE: RE: RE: RE: Combining multiple observations by an ID variable***From:*"Sarah Edgington" <sedging@ucla.edu>

- Prev by Date:
**Re: st: getting Stata to read a bizarre sequence of dates** - Next by Date:
**Re: st: Alternative to coefficient of variation (CV)/relative standard error as a measure of estimate's reliability/stability** - Previous by thread:
**RE: st: RE: RE: RE: RE: Combining multiple observations by an ID variable** - Next by thread:
**RE: st: RE: RE: RE: RE: Combining multiple observations by an ID variable** - Index(es):