Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: St: collapse by _N


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: St: collapse by _N
Date   Wed, 20 Oct 2010 11:31:31 +0100

All good advice, and here is some more:

1. I echo Michael in noting that -collapse- can produce a count variable, so that there is no need to set up your own. Of course, you would then need to drop data based on small samples after the -collapse-. 

2. Be aware of -contract-. It has precisely the role of collapsing to frequencies, and so by default produces a count variable. By implication Ric here wants mostly to -collapse- to means, but I've often seen people use -collapse- when their objective was more directly matched by -contract-. 

Nick 
n.j.cox@durham.ac.uk 

Michael Mitchell
================

   In addition to the great answers Chris and Ulrich sent, I might suggest that you 
include a variable that counts the number of valid observations. After having the 
collapsed file, you could then decide what you might want to use as a threshold for the 
data being too unreliable. You can see more examples about collapsing, including examples 
using count, at http://www.ats.ucla.edu/stat/stata/modules/collapse.htm .

Ulrich Kohler
=============

. bysort geocode: gen n = _N
. collapse (mean) varlist if n >= 20, by(geocode)

Chris Parker
============

You could count the observations in each geocode, then drop if there are too few observations then collapse.

bysort geocode: gen numobs=_N
drop if numobs < 20
collapse varlist, by(geocode)

Eric Uslaner
============

> I have a survey data set with respondents geocoded.  I want to collapse the data set to the geocode level, so the simple command would be:
>
> collapse varlist,by(geocode)
>
> However some geocodes barely have any respondents and any collapsed data would be unreliable.  Is there a straightforward way to collapse only if the number of respondents is>  20 (e.g.)?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index