Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: AW: levelsof problem?


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: AW: levelsof problem?
Date   Wed, 28 Jul 2010 17:32:06 +0100

What you didn't tell us turns out to be important. (That's a fact, not a criticism.) 

Another possibility you might consider is tagging e.g. all EU variables with a characteristic. See help for -char-.  

You could do this once and for all with 

foreach v of var DE_* NL_* { 
	char def `v'[eu] eu 
}

after which you can things like 

ds *_F, has(char eu) 

and the subset of names of variables with "eu" characteristic defined will be produced and will be accessible as r(varlist).

With -findname- (SJ) the syntax would be 

findname *_F, charname(eu) 

and -findname- also allows you to put the list of variable names directly into a local macro. -findname- is my attempt to improve on official -ds-, not quite as hubristic as that might appear. 

Of course, the varlist will be longer than DE_* NL_*. 

Characteristics are saved with datasets, an important detail. 

You wrote:

> Apparently the following does not work:
>
> egen eutotal = rowtotal(`lev'_D)
> egen eutotal = rowtotal(`lev'_F)

It works precisely as Stata's designers intended, but the effect is just text substitution and the effect is adding a suffix to the entire macro text, not that of adding a suffix to each word of the macro text. As Clyde Schechter underlines, that act requires more work. 

Nick 
n.j.cox@durham.ac.uk 

joe j

On second thought, I should create a variable:

gen countryF =country+"_F"

and then run the following.
 *******
 levelsof countryF if eu==1, local(lev) clean
 egen eutotal = rowtotal(`lev')
 *******
Similarly for variables with _D suffix.

Thanks again for all your suggestions.

joe j 

> Nick, Tirtankar, many many thanks.
>
> Nick's following suggestion would have worked for me
> *******
> levelsof country if eu==1, local(lev) clean
> egen eutotal = rowtotal(`lev')
> *******
> However, _Fs are not the only variables based on country names; there
> are others with  _D suffix and some with no suffix. Apparently the
> following does not work:
>
> *******
> egen eutotal = rowtotal(`lev'_D)
> egen eutotal = rowtotal(`lev'_F)
> *******
> -unab- is a good suggestion, and would be useful at some point.
> However, in addition to US_F there are many countries that I want to
> keep out! So for now I'd have to stick with Tirtankar's tips.
>
> I am sorry about rowtotal/rsum et al mix up. I have an older Stata
> version at home, so I keep switching between the old and new commands
> in my do file:)
>
> Joe.
>
> On Tue, Jul 27, 2010 at 7:26 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>> Correct on the first point, but that's the default. I know Kit Baum hates it, but my impression is that most users don't change it by -set varabbrev off-.
>>
>> I don't understand your second point. If it's that the solution may need modification in so far as the real problem of Joe J may differ from the toy problem, then naturally I agree.
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Tirthankar Chakravarty
>>
>> Not sure, but I think this:
>>
>> levelsof country if eu==1, local(lev) clean
>> egen eutotal = rowtotal(`lev')
>>
>> will work only if you set -varabbrev- on. The -unab- tip is a good one
>> and I thought about it, but the "US_F" variable could be a moving
>> target (or not).
>>
>> 2010/7/27 Nick Cox <n.j.cox@durham.ac.uk>:
>>
>>> This problem seems to me simpler than is being implied.
>>>
>>> The direct problem is that that Joe J needs a varlist to feed to -egen-'s -rowtotal()- function.
>>>
>>> His starting point could be the wildcard *_F which catches all the variable names ending in _F. The difficulty is that this includes the US_F variable which for Joe J is a step too far. (At this point I merely hint at the possibility of numerous obvious political jokes without actually making any of them.)
>>>
>>> The command -unab-, although usually billed as a programmer's command, is useful here. It does just one thing, unabbreviate (meaning expand) a varlist to all its implied  names, so that
>>>
>>> unab all : *_F
>>>
>>> unpacks all the names of the variables ending in _F and puts the result in a local macro. To remove US_F from the list we can turn to macro manipulation
>>>
>>> local US US_F
>>> local eu : list all - US
>>>
>>> which gives us a macro -eu- containing the desired names.
>>>
>>> Some people might want to emphasise that the varlist expansion is also done by other commands: see e.g. help on -describe, varlist-, -ds-, or -findname- (SJ). But any of those does much more than this one thing, so it is most straightforward to stick to -unab-.
>>>
>>> It also happens that the names of the countries concerned are held as values of Joe J's string variable -country-. The only real problem here is that the list result returned by -levelsof- is complicated by double quote delimiters, but as Tirthankar shows -- and the help file clearly explains -- an option -clean- gets rid of those.
>>>
>>> For Joe J's example dataset
>>>
>>> levelsof country if eu==1, local(lev) clean
>>> egen eutotal = rowtotal(`lev')
>>>
>>> should have worked so far as I can see. There is no need, for the example dataset, to spell out the _F suffix, although Tirthankar's code shows how to do it if needed.
>>>
>>> Confusion on names: Joe J mixed references to
>>>
>>> 1. -egen, rsum()- and -egen, rowtotal()-.
>>> 2. -levels- and -levelsof-.
>>>
>>> In both cases (just a coincidence, this) the second name has been the preferred name since Stata 9.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> joe j
>>>
>>> Thanks a lot, Tirthankar!
>>>
>>> Tirthankar Chakravarty
>>>
>>>> Then this (cumbersome) script should do what you want:
>>>> *********************************************
>>>> clear
>>>> input str2 country      eu      GE_F NL_F  UK_F US_F
>>>> US      0       1       1       1       0
>>>> US      0       1       1       1       0
>>>> NL      1       1       0       1       1
>>>> IN      0       1       1       1       1
>>>> GE      1       0       1       1       1
>>>> GE      1       0       1       1       1
>>>> US      0       1       1       1       0
>>>> US      0       1       1       1       0
>>>> US      0       1       1       1       0
>>>> PT      1       1       1       1       1
>>>> end
>>>> g PT_F = 2
>>>> levelsof country if eu==1, local(lev) clean
>>>> local lev2
>>>> foreach x of local lev {
>>>>        local lev2 " `lev2' `x'_F "
>>>> }
>>>> egen eutotal = rowtotal(`lev2')
>>>> *********************************************
>>>
>>> joe j
>>>
>>>>> Thanks, Martin. This is not quite what I wanted; The following command
>>>>>  is good enough.
>>>>> egen eutotal=rowtotal(GE_F NL_F  UK_F)
>>>>>
>>>>> The *_F variables need to be selected based on whether they belong to
>>>>> eu or not (GE_F NL_F  UK_F are selected, but not US_F) (The values of
>>>>> _*F variables are not based on whether eu=1 or otherwise).  But there
>>>>> are many groupings, like eu, and a lot of countries, so I was looking
>>>>> for an easy method to select. But it seems to me that manual selection
>>>>> is the only choice.
>>>
>>> Martin Weiss
>>>
>>>>>> You could of course -replace- to the values you want based on the -if-
>>>>>> qualifier after the fact:
>>>>>>
>>>>>>
>>>>>> *************
>>>>>> egen eutotal=rowtotal(GE_F NL_F  UK_F)
>>>>>> replace eutotal=. if !eu
>>>>>> *************
>>>>>>
>>>>>>
>>>>>> The reason that your second approach does not work is that Stata expects a
>>>>>> -varlist- while you feed it
>>>>>>
>>>>>> `"GE"' `"NL"' `"PT"'_F
>>>>>>
>>>>>> which it cannot process. Type -ma di- to see the contents of your -macro-s.
>>>
>>> joe j
>>>
>>>>>> >From a data set roughly like the following
>>>>>> clear
>>>>>> input str2 country      eu      GE_F NL_F  UK_F US_F
>>>>>> US      0       1       1       1       0
>>>>>> US      0       1       1       1       0
>>>>>> NL      1       1       0       1       1
>>>>>> IN      0       1       1       1       1
>>>>>> GE      1       0       1       1       1
>>>>>> GE      1       0       1       1       1
>>>>>> US      0       1       1       1       0
>>>>>> US      0       1       1       1       0
>>>>>> US      0       1       1       1       0
>>>>>> PT      1       1       1       1       1
>>>>>> end
>>>>>>
>>>>>> I want to calculate the  row sum of all *_F variables pertaining to eu
>>>>>> countries (all excluding US_F):
>>>>>> egen eutotal=rowtotal(GE_F NL_F  UK_F)
>>>>>>
>>>>>> However, I would prefer to follow some rules in selecting the variables,
>>>>>> like
>>>>>>
>>>>>> levels country if eu==1, local(lev)
>>>>>> egen eutotal=rsum(`lev'_F)
>>>>>>
>>>>>> This doesn't work, however. Any pointers would be appreciated.
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index