[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Julia Gamas <jgamas@mit.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: frustrated by missing variables--collapase and merge |

Date |
Tue, 29 Mar 2005 10:58:06 -0500 |

Hi, it depends on what you want to obtain from the collapse and merge. By merging you souldn't be losing any variables. In fact, your dataset should get bigger. If you had two variables of the same name, then one will get replaced. Check that you are merging using ALL the variables relevant to the merge. For example, if you want to merge by state and city, you would write: "merge state city using yourdatabase". I've fumbled up a few times and gotten nonsense when instead I wrote: "merge using yourdatabase", because Stata didn't know that I wanted it to merge by state and city. There are also several types of merges so you may want to make sure that you're using the instructions for the type you want (you may want to merge each line with the next, or merge each line by matching another variable such as city or state or year, for example). About collapse, you may lose any variables that aren't included in your expression. For example, lets say you have the following variables: year var1 var2 var3 and you want to collapse your data set by year, then you'd write something like: "collapse (sum) var1 (mean) var2 (median) var 3, by (year)" But if you forget one of the vars and do: "collapse (sum) var1 (mean) var2, by (year)" you'll lose var 3. Finally, there will be variables which, once you've collapsed, won't make sense anymore in the new dimension because the new "observations" have changed. For example: if I have one line per person in a dataset, and each person can be classified into a group using values 1 to 5, if I try to collapse the group variable, it won't keep the values for everybody because the new dataset will have been collapsed and each individual observation lost in that sense, unless I've asked it to collapse individuals into their group categories, in which case the end result will be a dataset with five observations: "collapse (sum) population, by (group)" will give me something like: group population 1 439 2 12,000 3 ...., etc. These are the most common mistakes I make that get mi in trouble with the commands and by which I lose variables. But if you send a bit more detail I may be able to help you a bit more. Good luck! Julia A. Gamas > > ------------------------------ > > Date: Thu, 17 Mar 2005 19:24:37 -0800 > From: Zhou YU <zyu@usc.edu> > Subject: st: frustrated by missing variables--collapase and merge > > Hi all, > > I have been trying to collapse merge a number of variables. What > frustrated me is that there is always one or two variables missing after > collapsing or merging. Last night I have to repeated the same procedures > several times, which took me the whole evening to create a dataset. > Intestingly, each time, different variables were missing. > > Have anyone encountered the same problem? Any solutions? > > Thanks a bunch! > > Zhou * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: stcox, strata() : time series operators not allowed** - Next by Date:
**Re: st: frustrated by missing variables--collapase and merge** - Previous by thread:
**Re: st: frustrated by missing variables--collapase and merge** - Next by thread:
**Re: st: frustrated by missing variables--collapase and merge** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |