Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable? |
Date | Fri, 21 Jun 2013 15:08:17 +0100 |
I am the author of the -sxpose- (SSC) referred to; it is as stated designed to transpose a string variable dataset. What you did comes with a warning, so I readily agree that it is not a good solution. With your dataset, I can save the string values, -drop- that variable, -xpose- and then copy back. . l +---------------------------------------------------+ | varlabels category diff_c~y catego~3 | |---------------------------------------------------| 1. | Some label .79221 .67874 .70605 | 2. | Another label .95949 .75774 .031833 | 3. | Additional label .65574 .74313 .27692 | 4. | Fourth label .035712 .39223 .046171 | 5. | Fifth label .84913 .65548 .097132 | |---------------------------------------------------| 6. | Final label .93399 .17119 .82346 | +---------------------------------------------------+ . forval i = 1/6 { 2. local label`i' = varlabels[`i'] 3. } . drop varlabels . xpose, clear . l +--------------------------------------------------------+ | v1 v2 v3 v4 v5 v6 | |--------------------------------------------------------| 1. | .79221 .95949 .65574 .035712 .84913 .93399 | 2. | .67874 .75774 .74313 .39223 .65548 .17119 | 3. | .70605 .031833 .27692 .046171 .097132 .82346 | +--------------------------------------------------------+ . forval i = 1/6 { 2. label var v`i' "`label`i''" 3. } This could be programmed up for more general situations. . Nick njcoxstata@gmail.com On 21 June 2013 14:40, John Bensin <johnalexbensin@gmail.com> wrote: > It looks like my email client mangled my first post, and I think the > formatting is relevant for the data sections, so I'll try this again. > > I have several datasets that fit this general format (variable names > are in the first row): > > > varlabels category diff_category category3 > "Some label" 0.79221 0.67874 0.70605 > "Another label" 0.95949 0.75774 0.031833 > "Additional label" 0.65574 0.74313 0.27692 > "Fourth label" 0.035712 0.39223 0.046171 > "Fifth label" 0.84913 0.65548 0.097132 > "Final label" 0.93399 0.17119 0.82346 > > > My goal is to use Stata 12 to get them into this format: > > _var1 _var2 _var3 _var4 _var5 _var6 cat > 0.792210 0.959490 0.655740 0.035712 0.849130 0.933990 > "category" > 0.678740 0.757740 0.743130 0.392230 0.655480 0.171190 > "diff category" > 0.706050 0.031833 0.276920 0.046171 0.097132 0.823460 > "category3" > > where _var1 is labelled "Some label", _var2 is labelled "Another > label", etc. The datasets are different and numerous enough that > labelling the variables individually isn't possible. The names of the > category variables aren't predictable either. > > My current solution is to improvise with Mata, xpose, and destring/tostring > > > clear > input str50 varlabels category diff_category category3 > "Some label" -18.2059 0.67874 105869.3 > "Another label" 0.95949 598.165 0.031833 > "Additional label" 0.65574 0.74313 0.27692 > "Fourth label" 194 0.39223 0.046171 > "Fifth label" 0.84913 394.069 -5968387 > "Final label" 0.93399 0.17119 0.82346 > end > > putmata varlabels, replace > > mata > varlabels = varlabels' > end > > xpose, clear varname > tostring v*, force usedisplayformat replace > getmata (v*)=varlabels, update force > > foreach v of varlist v* { > local name = `v'[1] > lab var `v' "`name'" > } > > drop if _varname == "varlabels" > destring, replace > > > > Although this works, I question if there is a better way because I see > a few problems with this: > > 1. The combination of tostring/destring may cause a loss of precision, > according to Stata's warnings. > > 2. The --getmata-- call forces the vector of labels into the first > observation, which seems clunky. Is it safe to always rely on this > working or do I risk overwriting data? > > I also pursued a solution using the --sxpose-- command from the SSC: > > > clear > input str50 varlabels category diff_category category3 > "Some label" 0.79221 0.67874 0.70605 > "Another label" 0.95949 0.75774 0.031833 > "Additional label" 0.65574 0.74313 0.27692 > "Fourth label" 0.035712 0.39223 0.046171 > "Fifth label" 0.84913 0.65548 0.097132 > "Final label" 0.93399 0.17119 0.82346 > end > > sxpose, clear force > > foreach v of varlist _var* { > local name = `v'[1] > lab var `v' "`name'" > } > > > This costs me the names of the category variables, which I need to > preserve, as well as some accuracy. It yields data that look like > this: > > _var1 _var2 _var3 _var4 > _var5 _var6 > Some label Another label Additional label Fourth label > Fifth label Final label > 0.792209983 0.959490001 0.655740023 0.035712 > 0.849129975 0.933990002 > 0.678740025 0.757740021 0.743130028 0.392230004 > 0.655480027 0.171189994 > 0.706049979 0.031833001 0.276919991 0.046170998 > 0.097131997 0.823459983 > > > Any suggestions? I'm happy to continue with my Mata/xpose solution if > it won't cost me data, but if there's a better solution I'll use it. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/