Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: St: collapse by _N |

Date |
Wed, 20 Oct 2010 11:31:31 +0100 |

All good advice, and here is some more: 1. I echo Michael in noting that -collapse- can produce a count variable, so that there is no need to set up your own. Of course, you would then need to drop data based on small samples after the -collapse-. 2. Be aware of -contract-. It has precisely the role of collapsing to frequencies, and so by default produces a count variable. By implication Ric here wants mostly to -collapse- to means, but I've often seen people use -collapse- when their objective was more directly matched by -contract-. Nick n.j.cox@durham.ac.uk Michael Mitchell ================ In addition to the great answers Chris and Ulrich sent, I might suggest that you include a variable that counts the number of valid observations. After having the collapsed file, you could then decide what you might want to use as a threshold for the data being too unreliable. You can see more examples about collapsing, including examples using count, at http://www.ats.ucla.edu/stat/stata/modules/collapse.htm . Ulrich Kohler ============= . bysort geocode: gen n = _N . collapse (mean) varlist if n >= 20, by(geocode) Chris Parker ============ You could count the observations in each geocode, then drop if there are too few observations then collapse. bysort geocode: gen numobs=_N drop if numobs < 20 collapse varlist, by(geocode) Eric Uslaner ============ > I have a survey data set with respondents geocoded. I want to collapse the data set to the geocode level, so the simple command would be: > > collapse varlist,by(geocode) > > However some geocodes barely have any respondents and any collapsed data would be unreliable. Is there a straightforward way to collapse only if the number of respondents is> 20 (e.g.)? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: St: collapse by _N***From:*"Eric Uslaner" <euslaner@gvpt.umd.edu>

**Re: st: St: collapse by _N***From:*"Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com>

- Prev by Date:
**st: Stata and Emacs interactively** - Next by Date:
**Re: st: Stata and Emacs interactively** - Previous by thread:
**Re: st: St: collapse by _N** - Next by thread:
**st: poisson gof in table - estout command** - Index(es):