Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Collapse command


From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: Collapse command
Date   Fri, 4 Mar 2011 21:12:50 +0000

<>

I am puzzled by what you mean when you say it just ends.  A few questions:

Does Stata print the line "end of do file" line after your last line?   If there is no message and you really are working on with ~ 30 G of data on a machine with at least 30G of RAM, I wonder  if Stata is still trying to perform the collapse (which could take a while with that many obs).   If you don't have 30G of RAM and you opened a 30G dataset, then it could take a really long time (is Stata still trying to run the command (do you see the spinning wheel in the bottom right corner of the Results window (assuming, based on the filepath in your code, that you are using a Mac) , or can you enter more commands in the command window and they run?  ). 
 I suspect you know that it could take a really long time which is why you -set virtual on- (although you should realize what -set virtual on- is (not) doing:  http://www.stata.com/statalist/archive/2007-06/msg00875.html )

Does the data properly -collapse- (that is, if you -browse- the data, are they collapsed) and you just don't get the -list- output you expected?

Try reducing the size of your dataset (-drop- or -sample- some of your obs)  and then -collapse- it to see if what happens.

Try running this example and see if it works and if so, what is different in this example from your actual dataset (besides the size)?
*******
clear

inp str11(campus) year studentid y_red
"001903001" 2001 1  99
"001903001" 2002 1  90
"001903023" 2001 101  88
"001903023" 2001 100  55
"001903002" 2001 100  199
"001903002" 2002 100  159
end

destring campus, replace 
format campus %09.0f
/* note:
assuming you're using AEIS/PEIMS data, you can replace "format campus %20.0f"
with the command above to preserve the leading zeroes in the campus ids, 
but I prefer to keep them as string variables
*/

collapse (count) y_red_count=y_red, by(campus year)

list
********

- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754

On Mar 4, 2011, at 2:33 PM, <[email protected]>
 wrote:

> Hello, I have a huge data set with student level data.
> 
> I have been trying to collapse the data set at the school level, first time I did it, it worked, but that only one time. 
> 
> Here's the code: 
> 
> set mem 30g
> 
> set virtual on
> 
> use /home/jpellerano/scores_2003-2010_cleaned.dta
> 
> destring campus, replace
> 
> format campus %20.0f
> 
> collapse (count) y_red_count=y_red, by(campus year)
> 
> list
> 
> 
> After reching the collapse line STATA ends the do file.
> 
> I'll appreciate any suggestion.
> 
> Thanks,
> 
> Jose A. Pellerano
> Texas A&M University
> Dpt of Economics
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index