[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Selecting stubnames for -reshape- / macrolists

From	Philipp Rehm <[email protected]>
To	[email protected]
Subject	Re: st: Selecting stubnames for -reshape- / macrolists
Date	Mon, 30 Jun 2008 20:29:59 +0200

.
Thank you very much, Nick. Much appreciated!
This is a very nice solution. I've seen the FAQ you reference (and, as you point out, it doesn't solve my problem exactly. But you program does).

Thanks again,
Philipp

Nick Cox wrote:

This problem is discussed in an FAQ:
FAQ . . . . . . . . . . . . . . . . . . . . . . . . Problems with
reshape
12/03 I am having problems with the reshape command. Can
you give further guidance?
http://www.stata.com/support/faqs/data/reshape3.html

However, what is there does not add much to your discussion.
================= extract from FAQ Question

If I have many variables all occurring in pairs for two years 1997 and
1998, so that the dataset looks like A97, A98, B97, B98, and so on, is
there any easy way to reshape the data to long without typing all the
stub names?
Answer

The variable names are collectively *97 *98, so we need a way of
expanding that list of wildcard names automatically and then removing
the suffix. We can work on either *97 or *98.
-unab- is usually billed as a programmer's command, but it can be used
interactively. It unabbreviates a varlist and puts the result in a local
macro.
. unab vars : *97
Then we zap all the occurrences of the suffix "97":
. local stubs : subinstr local vars "97" "", all

In other words, each occurrence of "97" is replaced by an empty string;
that is, they are removed. See help on -macro-.
Then we can
. reshape long `stubs', options
================================

That said, it is possible to write a program more general than Paul's.
*! NJC 1.0.0 30 June 2008
* collects stubs of form prefix_ from varlist * with names of form prefix_suffix * call by e.g. stubs *_* program stubs version 8.2 syntax varlist foreach v of local varlist { local stub = substr("`v'", 1, strpos("`v'", "_")) if !`: list stub in all' { local all `all' `stub' }
} c_local stubs `all' end
If you run
. stubs *_*
on your example dataset, you'll find a local stubs containing
x_ y_
left behind.
Nick
[email protected]
Philipp Rehm

.
Thanks, Paul.

You wrote:
"You will want to make sure that if your subscripts all look like _1, _2, _3... that you only use one of them. Grabbing *_* will give the same stubname many times."

Alas, that's exactly my problem. I need to grab *_*, because I don't know which subscripts are present or absent.

(Since this may sound a little strange, let me explain why this is so. I

have a data-set with thousands of variables over a maximum of 40 years or so, in wide format. The problem is that not all variables are present

for all years, which is why I prefer to grab everything and get the stubs from there).

E. Paul Wileyto wrote:

I have the following implemented as an ADO. It is what I use to grab all the name stubs for reshaping. You will want to make sure that if your subscripts all look like _1, _2, _3... that you only use one of them. Grabbing *_* will give the same stubname many times.

Paul

program define namelist
***********************************************************************
***********************************************************************
*  This short program helps you define a list (global macro vlist)
*  of variable name roots to aid in scripting and especially
reshaping.
*
*  Feel free to use wildcards in varlist.  Be sure that you are only
*  generating the same root once.
*
*  The oldsub option describes the common subscript in the the list.
*  The newsub option describes the truncated subscript to be used in
*  reshaping.
* *  For example:  Variables rx30a_s1..rx30a_s20 all have the common
root
*  rx30a_s, with a numerical subscript for the repetition.  To extract
a
*  list of roots for all survey entries, use:
*
*  -namelist rx*_s1, oldsub(_s1) newsub(_s)
*
***********************************************************************
***********************************************************************
syntax varlist, [oldsub(string) newsub(string)]
global vlist "`varlist'"
if "`oldsub'" > "" & "`newsub'" > "" {
global vlist : subinstr local varlist "`oldsub'" "`newsub'" , all
}
noisily ma list vlist
end


[email protected] wrote:
.
My question is whether there is a clever way to identify the
stubnames
for a -reshape-. Consider this example:

sysuse auto, clear
foreach j in 1 2 3 4 5 {
    gen x_`j'=rep78+`j'
    gen y_`j'=rep78-`j'
}
drop y_3
gen id=_n
keep id x_* y_*

/*
This data-set contains the following variables which I want to
-reshape-:
x_1  x_2  x_3  x_4  x_5
y_1  y_2       y_4  y_5
I want to -reshape- by stubnames x_ and y_.
In my "real" data-set, I know that all variables I want to end up
with
as stubnames contain an underscore (_). They all also do contain
numbers
after the underscore, but there is no regular pattern there.

Now, I am generating a -macrolist- with unique stubnames, but this
seems
like a detour to me, especially the loop:
*/

ds *_*
foreach v in `r(varlist)' {
        local f `=regexr("`v'","[0-9]+","")'
        local n `n' `f'
        local n: list uniq n
}
di "*** `n' ****"
reshape long `n', i(id) j(foo)

Two questions:
1) Is there a better way to identify stubnames for a -reshape-?
2) Is there a more straightforward way to arrive at a unique
macrolist
than the one I chose? In particular, something like the -regexr()-
function, but with the ability to remove all instances (not just the
first) of the regular expression. Something like -subinstr()-, but
for
regular expressions.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Selecting stubnames for -reshape- / macrolists
  - From: "[email protected]" <[email protected]>
- Re: st: Selecting stubnames for -reshape- / macrolists
  - From: "E. Paul Wileyto" <[email protected]>
- Re: st: Selecting stubnames for -reshape- / macrolists
  - From: Philipp Rehm <[email protected]>
- RE: st: Selecting stubnames for -reshape- / macrolists
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: RE: RE: clorenz syntax
Next by Date: st: set: simple question - confidence interval plot
Previous by thread: RE: st: Selecting stubnames for -reshape- / macrolists
Next by thread: st: Percent correctly predicted after Asclogit or Clogit
Index(es):
- Date
- Thread