Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Looping across variables in variable columns


From   "Mungai, Edward" <[email protected]>
To   <[email protected]>
Subject   RE: st: Looping across variables in variable columns
Date   Thu, 26 Apr 2007 23:44:31 +0200

Thank you all who gave ideas on this querry.
 
Being new to stata I have chosen and  managed to implement solution 1B and taken into consideration other comments relating to missing values. I am motivated to learn more stata!.
 
Edward.

________________________________

From: [email protected] on behalf of n j cox
Sent: Thu 4/26/2007 1:33 PM
To: [email protected]
Subject: Re: st: Looping across variables in variable columns



I can't find -rowcount- using -findit- or Google. Recall the precept in
the Statalist FAQ:

"Say what command(s) you are using. If they are not part of official
Stata, say where they come from: the STB/SJ, SSC, or other archives."

That said, this is a nice problem. It has got enough spin (cricket
sense, not public relations!) to be interesting, yet yields to a little
knowledge of Stata.

I assume a structure like this

first last x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
2      7
3      4

This yields to brute force by

------------------------------- solution 1
gen xsum = 0
qui forval i = 1/`=_N' {
        local f = first[`i']
        local l = last[`i']
        forval j = `f'/`l' {
                replace xsum = xsum + x`j' in `i'
        }
}
------------------------------

If you get paid for writing shorter code

------------------------------- solution 1A
gen xsum = 0
qui forval i = 1/`=_N' {
        forval j = `=first[`i']'/`=last[`i']' {
                replace xsum = xsum + x`j' in `i'
        }
}
------------------------------

but shorter is not always better. Even most Stata experts would,
I think, find this a little less clear.

More importantly: This kind of example is a good advertisement for
-forval-, showing how something otherwise awkward yields to loops over
observations or variables.

A more careful solution that is a sum over non-missings is

------------------------------- solution 1B
gen xsum = 0
gen count = 0
qui forval i = 1/`=_N' {
        local f = first[`i']
        local l = last[`i']
        forval j = `f'/`l' {
                replace xsum = xsum + x`j' in `i' if x`j' < .
                replace count = count + (x`j' < .) in `i'
        }
}
replace xsum = . if count == 0
------------------------------

A -reshape- solution is illustrated by

------------------------------- set up example dataset
clear
set obs 10
gen id = _n
gen first = ceil(5 * uniform())
gen last = first + ceil(5 * uniform())
forval i = 1/10 {
        gen x`i' = ceil(3 * uniform())
}
------------------------------

Note in passing the use of -ceil(<k> * uniform())- to get (equally
probable) random integers from 1 to k.

------------------------------ solution 2
reshape long x, i(id)
bysort id (_j) : gen xsum = sum(x * inrange(_j, first, last))
by id : replace xsum = xsum[_N]
reshape wide
------------------------------

Solution 2 is really better for most predictable purposes. Your data
sound like panel data and as such are more easily handled with a
long structure. Solution 1 is easy for sums (and by extension means),
but very difficult to generalise to most other statistics, or most other
calculations. Solution 1 is also slow.

I would -reshape- long and use that structure for any manipulations
that otherwise you would want to do row-wise.

------------------------------ solution 2A
reshape long x, i(id)
keep if inrange(_j, first, last)
drop first last
rename _j time
egen sum = total(x), by(id)
egen median = median(x), by(id)
egen tag = tag(id)
list id sum median if tag
...
* when done
keep if tag
------------------------------

Nick
[email protected]

Mungai, Edward <[email protected]>

I have seen discussions on looping across an unknown number of columns
which can be solved by -reshape-.

My question is how to sum across columns where the number of columns
varies from one observation to the next.  i.e. I may need to sum columns
2 to 5 for the first observation but from columns 4 to  11 for the
second observation. In all cases the summation is done across a set of
adjacent columns.

I have tried to use -rowcount-,  and the looping functions but as far as
I can tell all sum across a fixed number of columns from one observation
to the next. But there is something good also; the information on the
column to  begin and the column to end the summation for each
observation is contained  in two adjacent column which are the same for
all the observations.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/




This message has been scanned for viruses by TRENDMICRO,
an IESE technology affiliate company  and global leader in antivirus and content security software.

<<winmail.dat>>




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index