Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?


From   John Bensin <johnalexbensin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
Date   Fri, 21 Jun 2013 08:34:26 -0500

I have several datasets that fit this general format (variable names
are in the first row):

        varlabels             category     diff_category     category3
        "Some label"        0.79221      0.67874         0.70605
        "Another label"     0.95949      0.75774         0.031833
        "Additional label"  0.65574      0.74313         0.27692
        "Fourth label"      0.035712     0.39223         0.046171
        "Fifth label"       0.84913      0.65548         0.097132
        "Final label"       0.93399      0.17119         0.82346

My goal is to use Stata 12 to get them into this format:

           _var1      _var2      _var3      _var4      _var5
_var6        cat
        0.792210   0.959490   0.655740   0.035712   0.849130
0.933990        "category"
        0.678740   0.757740   0.743130   0.392230   0.655480
0.171190        "diff category"
        0.706050   0.031833   0.276920   0.046171   0.097132
0.823460        "category3"

where _var1 is labelled "Some label", _var2 is labelled "Another
label", etc. The datasets are
different and numerous enough that labelling the variables
individually isn't possible. The names
of the category variables aren't predictable either.

My current solution is to improvise with Mata, xpose, and destring/tostring


        clear
        input str50 varlabels category diff_category category3
        "Some label"        -18.2059     0.67874     105869.3
        "Another label"     0.95949      598.165     0.031833
        "Additional label"  0.65574      0.74313     0.27692
        "Fourth label"        194          0.39223     0.046171
        "Fifth label"       0.84913      394.069     -5968387
        "Final label"       0.93399      0.17119     0.82346
        end

        putmata varlabels, replace

        mata
            varlabels = varlabels'
        end

        xpose, clear varname
        tostring v*, force usedisplayformat replace
        getmata (v*)=varlabels, update force

        foreach v of varlist v* {
            local name = `v'[1]
            lab var `v' "`name'"
        }

        drop if _varname == "varlabels"
        destring, replace


Although this works, I question if there is a better way because I see
a few problems with this:

        1. The combination of tostring/destring may cause a loss of
precision, according to Stata's
           warnings.

        2. The --getmata-- call forces the vector of labels into the
first observation, which seems clunky.
           Is it safe to always rely on this working or do I risk
overwriting data?

I also pursued a solution using the --sxpose-- command from the SSC:


        clear
        input str50 varlabels category diff_category category3
        "Some label"        0.79221      0.67874     0.70605
        "Another label"     0.95949      0.75774     0.031833
        "Additional label"  0.65574      0.74313     0.27692
        "Fourth label"        0.035712     0.39223     0.046171
        "Fifth label"       0.84913      0.65548     0.097132
        "Final label"       0.93399      0.17119     0.82346
        end

        sxpose, clear force

        foreach v of varlist _var* {
            local name = `v'[1]
            lab var `v' "`name'"
        }


This costs me the names of the category variables, which I need to
preserve; it yields data that look like
this:

    _var1             _var2               _var3               _var4
           _var5            _var6
    Some label        Another label      Additional label     Fourth
label      Fifth label        Final label
    0.792209983        0.959490001        0.655740023        0.035712
         0.849129975        0.933990002
    0.678740025        0.757740021        0.743130028
0.392230004        0.655480027        0.171189994
    0.706049979        0.031833001        0.276919991
0.046170998        0.097131997        0.823459983


Any suggestions? I'm happy to continue with my Mata/xpose solution if
it won't cost me data, but if there's a
better solution I'll use it.

John Bensin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index