Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: foreach program

From   "Michael Blasnik" <>
To   <>
Subject   Re: st: Re: foreach program
Date   Fri, 08 Sep 2006 12:22:59 -0400

The code I posted should work for your problem. If you want to run it for multiple age groupss, I suggest you create a categorical variable for those cohorts as well (using recode?). I have revised my code to solve the full problem (assuming you have created agecat to represent the age categories):

gen byte groupqtrs=(gqtyped==200)
keep fip race sex agecat groupqtrs
collapse (sum) perwt, by(fip race sex agecat groupqtrs)
reshape wide perwt, i(fip race sex agecat) j(groupqtrs)
gen totpersons=perwt0+perwt1
gen ir=perwt1/totpersons

This will give you one observation for each demographic cell: by fip, race, sex and agecat. You may then wish to do more reshapes or select certain cases for further analysis. I would guess that it will save you a lot of time to do it this way. If you are running into memory constraints for holding the entire national 5% sample (or using virtual memory), you could loop through the states, reading in each state's data one at a time from the master file and running this code and saving the results in a file named after the fip code. The -keep- command in the second line may help a lot in terms of file size, so you may not need to worry about that.

Michael Blasnik

----- Original Message ----- From: "Scott Cunningham" <>
To: <>
Sent: Friday, September 08, 2006 12:12 PM
Subject: Re: st: Re: foreach program

Dear Michael,

If there is a faster way to do what I'm doing, then I'd love to know it, as the code I use takes me a few days to execute because of the computer I'm using and the size of the Census longform survey. Here's a description of what I'm doing. I am calculating incarceration rates by demographic cell, which is defined at the United States state-age-race-sex-year level. I have data for 1980, 1990 and 2000. In 1980, the "group quarter" variable was definite differently than how it was defined in 1990 and 2000, so I've been running two do files - one for 1980 and one for 1990/2000, but they are essentially identical.

I have 9 different age cohorts. I only reported the code for one of them, since they are all identical calculations. The age cohorts are:

1. 15-19 year olds
2. 20-24 year olds
3. 25-29 year olds
4. 30-34 year olds
5. 35-39 year olds
6. 40-44 year olds
7. 45-54 year olds
8. 55-64 year olds
9. 65+ year olds

I have 51 states (50 US states plus District of Columbia).

I have two races (black and white), two sex values, and three census years (1980, 1990 and 2000). My understanding was that to create so many separate incarceration rates and levels, I would need to reproduce the same code for each demographic cell. So I've been using -foreach- to do it. Do you disagree, though, that this is not the most efficient method?


On Sep 8, 2006, at 12:02 PM, Michael Blasnik wrote:

I've been reading this thread and don't understand why you need to loop at all or generate the grouping variable. Wouldn't it make more sense to use a collapse and a reshape?

keep if inrange(age,15,19)
gen byte groupqtrs=(gqtyped==200)
collapse (sum) perwt, by(fip race sex groupqtrs)
reshape wide perwt, i(fip race sex) j(groupqtrs)
gen totpersons=perwt0+perwt1
gen ir=perwt1/totpersons

This approach seems easier and faster and gives you a dataset of results directly.
You could take the results and merge them back into the main dataset if you want, but I don't even think that is necessary.

Michael Blasnik
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index