Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Ordering a varlist according to "parent" and "child" variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Ordering a varlist according to "parent" and "child" variables
Date   Tue, 20 Dec 2011 16:57:02 +0000

My point was, and remains, that it is trickier than you thought.

I wouldn't want to undermine anyone's fun with a programming problem,
but I wonder if it would be quicker to think of a naming convention
such that -order- automatically produces the right order?

Nick

On Tue, Dec 20, 2011 at 3:52 PM, A Loumiotis
<antonis.loumiotis@gmail.com> wrote:

> @ Nick
> Thanks for your help.  I thought that this is tricky so that's why I
> asked for help!!
>
> @Matt
> Thanks for the code!!! I have to get familiar with Mata.  Running the
> code for the last example that I gave though does not produce the
> ordering that I would like.  I would like the new ordering to keep as
> much as possible the original ordering of the varlist (v1-v11) as well
> as the ordering of the children variables for each parent variable.
>
> Working today on this problem I developed the following code for this task.
> If there are more than three child generations I will have to manually
> increase the iterations in the code. Perhaps I can use -while- to further
> automate this task but I'm not sure how.
>
> Antonis
>
> local all v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
> local cofv4 v1 v2 v6
> local cofv5 v3
> local cofv6 v8 v11
> local cofv7 v4 v10
>
> foreach var of local all {
>        local rvars:list all - var
>        local i=0
>        foreach rvar of local rvars {
>                if !(`:list var in cof`rvar'') local i=`i'+0
>                else local i=`i'+1
>        }
>        if `i'==0 local nparlist "`nparlist' `var'"
> }
> di "`nparlist'"
>
> foreach var of local nparlist {
>        local eocl
>        if `:list sizeof cof`var''>=1 {
>                forvalues i=1/`:list sizeof cof`var'' { // loop for 1st gen
>                        local varg`i': word `i' of `cof`var''
>                        local eocl`i' "`varg`i''"
>                        if `:list sizeof cof`varg`i'''>=1 {
>                                forvalues j=1/`:list sizeof cof`varg`i''' { // loop for 2nd gen.
>                                        local var2g`j': word `j' of `cof`varg`i'''
>                                        local eocl`i'`j' "`var2g`j''" //
>                                        if `:list sizeof cof`var2g`j'''>=1 {
>                                                forvalues k=1/`:list sizeof cof`var2g`j''' { // loop for 3rd gen
>                                                        local var3g`k': word `k' of `cof`var2g`j'''
>                                                        local eocl`i'`j'`k' "`var3g`k''"
>                                                        if `:list sizeof cof`var3g`k'''>=1 {
>                                                                di as error "`The family of `var' has more than three
> generations.  Consider adding more iteration(s)"
>                                                        }
>                                                        local eocl`i'`j' "`eocl`i'`j'' `eocl`i'`j'`k''"
>                                                }
>                                        }
>                                        local eocl`i' "`eocl`i'' `eocl`i'`j''"
>                                }
>                        }
>                        local eocl "`eocl' `eocl`i''"
>                }
>                local expslist "`expslist' `var' `eocl'"
>        }
>        else local expslist "`expslist' `var'"
> }
>
> On Tue, Dec 20, 2011 at 4:48 AM, Matthew White
> <mwhite@poverty-action.org> wrote:
>> Hi Antonis,
>>
>> The following code isn't pretty, but it might do the job. I'm sure
>> there are more elegant ways (maybe using more Mata). Strategy: input
>> as the dataset in memory all the relations; then create a local `tree'
>> that describes the branches of the relationship trees; then load the
>> dataset whose variables you want to sort and reorder them using
>> `tree'.
>>
>> Best,
>> Matt
>>
>> clear
>> input str32 parent str32 child
>> var9
>> var4 var1
>> var4 var2
>> var4 var6
>> var5 var3
>> var6 var8
>> var6 var11
>> var7 var4
>> var7 var10
>> end
>>
>> capture program drop gettree
>> program gettree, rclass
>>        syntax anything(name=p), parent(varname) child(varname) [branch(str)]
>>
>>        local branch `branch' `p'
>>        quietly levelsof `child' if `parent' == "`p'"
>>        if `"`r(levels)'"' == "" return local tree `""`branch'""'
>>        else {
>>                foreach level in `r(levels)' {
>>                        if `:list level in branch' {
>>                                local branch `branch' `level'
>>                                local dibranch : subinstr local branch " " "@"
>>                                local dibranch : subinstr local dibranch " " ", which is the
>> parent of ", all
>>                                local dibranch : subinstr local dibranch "@" " is the parent of "
>>                                display as err "circular structure: `dibranch'."
>>                                exit 198
>>                        }
>>                        gettree `level', parent(`parent') child(`child') branch(`branch')
>>                        local rtree `"`rtree' `r(tree)'"'
>>                }
>>                return local tree `"`rtree'"'
>>        }
>> end
>>
>> levelsof parent, local(parents)
>> levelsof child, local(children)
>> foreach parent of local parents {
>>        if !`:list parent in children' {
>>                gettree `parent', parent(parent) child(child)
>>                local tree `"`tree' `r(tree)'"'
>>        }
>> }
>>
>> clear
>> forvalues i = 1/11 {
>>        generate var`i' = 0
>> }
>>
>> mata:
>> tokens = tokens(st_local("tree"))'
>> st_local("tree", invtokens(J(1, rows(tokens), `"""') + collate(tokens,
>> order((strlen(tokens) - strlen(subinstr(tokens, " ", "", .)),
>> (1::rows(tokens))), (-1, -2)))' + J(1, rows(tokens), `"""')))
>> end
>>
>> foreach branch of local tree {
>>        gettoken progenitor progeny : branch
>>        if "`progeny'" == "" order `progenitor'
>>        else order `progeny', after(`progenitor')
>> }
>>
>> On Mon, Dec 19, 2011 at 2:02 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>>> Evidently you want code that will recognise not just parents and children, but several generations too. In your simple example I count at least four generations, as var7 has child var4 which has child var6 which has child var8.
>>>
>>> No doubt this is standard stuff for some tree- or network-processing software, but I'm not sure that it's possible to do what you want with a few lines of Stata (or even Mata).
>>>
>>> I tried recasting your macros as
>>>
>>> local all v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
>>> local c1 v4 v1 v2 v6
>>> local c2 v5 v3
>>> local c3 v6 v8 v11
>>> local c4 v7 v4 v10
>>>
>>> which made the problem a bit clearer to me. One naïve algorithm is then
>>>
>>> 1. Parse each -c- macro into parent (first) and children (others)
>>> 2. Zap the children.
>>> 3. Put each -c- macro in the place of the parent.
>>>
>>> And this is, I think, equivalent to what Matthew White was doing.
>>>
>>> The problem is that this will only do what you want if you process the macros in the right order.
>>>
>>> The failure of one simple algorithm clearly does not rule out the existence of another, but I guess that this is trickier than you think.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> A Loumiotis
>>>
>>> Thanks a lot Matt for your help.
>>> It works for the particular example that I gave but if I modify the
>>> example a bit I think it will not work.  Consider the following:
>>>
>>> var4 has three "child" variables var1 var2 var6
>>> var5 has one "child" variable var3
>>> var6 has two "child" variables var8 var11
>>> var7 has two "child" variable var4 var10
>>>
>>> The ordered varlist should be as follows:
>>> v5 v3 v7 v4 v1 v2 v6 v8 v11 v10 v9
>>>
>>> The method you suggested will produce the following:
>>> v1 v2 v6 v8 v11 v5 v3 v7 v4 v10 v9
>>>
>>> Here is the output
>>> . forvalues i=1/11 {
>>>  2.         gen v`i'=.
>>>  3. }
>>> . local childofv4 v1 v2 v6
>>> . local childofv5 v3
>>> . local childofv6 v8 v11
>>> . local childofv7 v4 v10
>>> . ds
>>> v1   v2   v3   v4   v5   v6   v7   v8   v9   v10  v11
>>> . foreach var in `r(varlist)' {
>>>  2.         if "`childof`var''" != "" order `childof`var'', after(`var')
>>>  3. }
>>> . ds
>>> v1   v2   v6   v8   v11  v5   v3   v7   v4   v10  v9
>>>
>>>
>>>
>>> On Mon, Dec 19, 2011 at 6:28 PM, Matthew White
>>> <mwhite@poverty-action.org> wrote:
>>>
>>>> I think your setup so far seems fine, though note that the names of
>>>> -local-s can be at most 31 characters, so since you're using 7 for
>>>> "childof", you'll have trouble with variables whose names are more
>>>> than 24 characters. How about a loop like this:
>>>>
>>>> ds
>>>> foreach var in `r(varlist)' {
>>>>    if "`childof`var''" != "" order `child`var'', after(`var')
>>>> }
>>>>
>>>> On Mon, Dec 19, 2011 at 11:15 AM, A Loumiotis
>>>> <antonis.loumiotis@gmail.com> wrote:
>>>
>>>>> I'm working with survey data where the variables are related in the
>>>>> following way.  There are "parent" and "child" variables where each
>>>>> "child" variable can have at most one "parent" variable.  What I would
>>>>> like to do is to find a general way to reorder the initial varlist in
>>>>> a way that the "child" variables follow right after the "parent"
>>>>> variables.
>>>>>
>>>>> Consider the following simplified example of a survey dataset with 11
>>>>> variables var1-var11.
>>>>>
>>>>> var1 has three "child" variables var2-var4
>>>>> var3 has two "child" variables var5-var6
>>>>> var5 has two "child" variables var8-var9
>>>>> var7 has one "child" variable var11
>>>>>
>>>>> What I would like to do is to find a (general) way to reorder the
>>>>> initial varlist var1-var11 as follows:
>>>>>
>>>>> var1 var2 var3 var5 var8 var9 var6 var4 var7 var11 var10
>>>>>
>>>>> What I have done up to now is to create locals for each variable that
>>>>> contains the child variables if any.  For the simplified example these
>>>>> locals are defined as follows:
>>>>>
>>>>> local childofvar1 var2 var3 var4
>>>>> local childofvar2
>>>>> local childofvar3 var5 var6
>>>>> local childofvar4
>>>>> local childofvar5 var8 var9
>>>>> local childofvar6
>>>>> local childofvar7 var11
>>>>> local childofvar8
>>>>> local childofvar9
>>>>> local childofvar10
>>>>> local childofvar11
>>>>>
>>>>> I've tried to use loops to generate the ordered varlist but I'm not
>>>>> successful.  Any help will be greatly appreciated!!!

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index