Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: frustrated by missing variables--collapase and merge

From   Julia Gamas <>
Subject   st: frustrated by missing variables--collapase and merge
Date   Tue, 29 Mar 2005 10:58:06 -0500

it depends on what you want to obtain from the collapse and merge.  By merging
you souldn't be losing any variables.  In fact, your dataset should get bigger.
 If you had two variables of the same name, then one will get replaced.  Check
that you are merging using ALL the variables relevant to the merge.  For
example, if you want to merge by state and city, you would write:
"merge state city using yourdatabase".  I've fumbled up a few times and gotten
nonsense when instead I wrote:
"merge using yourdatabase", because Stata didn't know that I wanted it to merge
by state and city.  There are also several types of merges so you may want to
make sure that you're using the instructions for the type you want (you may
want to merge each line with the next, or merge each line by matching another
variable such as city or state or year, for example).
About collapse, you may lose any variables that aren't included in your
expression.  For example, lets say you have the following variables:
year var1 var2 var3 and you want to collapse your data set by year, then you'd
write something like:
"collapse (sum) var1 (mean) var2 (median) var 3, by (year)"
But if you forget one of the vars and do:
"collapse (sum) var1 (mean) var2, by (year)"
you'll lose var 3.
Finally, there will be variables which, once you've collapsed, won't make sense
anymore in the new dimension because the new "observations" have changed.  For
example: if I have one line per person in a dataset, and each person can be
classified into a group using values 1 to 5, if I try to collapse the group
variable, it won't keep the values for everybody because the new dataset will
have been collapsed and each individual observation lost in that sense, unless
I've asked it to collapse individuals into their group categories, in which
case the end result will be a dataset with five observations:
"collapse (sum) population, by (group)"
will give me something like:
group    population
1              439
2           12,000
3            ....,   etc.
These are the most common mistakes I make that get mi in trouble with the
commands and by which I lose variables.  But if you send a bit more detail I
may be able to help you a bit more.
Good luck!
Julia A. Gamas
> ------------------------------
> Date: Thu, 17 Mar 2005 19:24:37 -0800
> From: Zhou YU <>
> Subject: st: frustrated by missing variables--collapase and merge
> Hi all,
> I have been trying to collapse merge a number of variables. What 
> frustrated me is that there is always one or two variables missing after 
> collapsing or merging. Last night I have to repeated the same procedures 
> several times, which took me the whole evening to create a dataset. 
> Intestingly, each time, different variables were missing.
> Have anyone encountered the same problem? Any solutions?
> Thanks a bunch!
> Zhou

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index