Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
John Bensin <johnalexbensin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable? |

Date |
Fri, 21 Jun 2013 08:40:26 -0500 |

It looks like my email client mangled my first post, and I think the formatting is relevant for the data sections, so I'll try this again. I have several datasets that fit this general format (variable names are in the first row): varlabels category diff_category category3 "Some label" 0.79221 0.67874 0.70605 "Another label" 0.95949 0.75774 0.031833 "Additional label" 0.65574 0.74313 0.27692 "Fourth label" 0.035712 0.39223 0.046171 "Fifth label" 0.84913 0.65548 0.097132 "Final label" 0.93399 0.17119 0.82346 My goal is to use Stata 12 to get them into this format: _var1 _var2 _var3 _var4 _var5 _var6 cat 0.792210 0.959490 0.655740 0.035712 0.849130 0.933990 "category" 0.678740 0.757740 0.743130 0.392230 0.655480 0.171190 "diff category" 0.706050 0.031833 0.276920 0.046171 0.097132 0.823460 "category3" where _var1 is labelled "Some label", _var2 is labelled "Another label", etc. The datasets are different and numerous enough that labelling the variables individually isn't possible. The names of the category variables aren't predictable either. My current solution is to improvise with Mata, xpose, and destring/tostring clear input str50 varlabels category diff_category category3 "Some label" -18.2059 0.67874 105869.3 "Another label" 0.95949 598.165 0.031833 "Additional label" 0.65574 0.74313 0.27692 "Fourth label" 194 0.39223 0.046171 "Fifth label" 0.84913 394.069 -5968387 "Final label" 0.93399 0.17119 0.82346 end putmata varlabels, replace mata varlabels = varlabels' end xpose, clear varname tostring v*, force usedisplayformat replace getmata (v*)=varlabels, update force foreach v of varlist v* { local name = `v'[1] lab var `v' "`name'" } drop if _varname == "varlabels" destring, replace Although this works, I question if there is a better way because I see a few problems with this: 1. The combination of tostring/destring may cause a loss of precision, according to Stata's warnings. 2. The --getmata-- call forces the vector of labels into the first observation, which seems clunky. Is it safe to always rely on this working or do I risk overwriting data? I also pursued a solution using the --sxpose-- command from the SSC: clear input str50 varlabels category diff_category category3 "Some label" 0.79221 0.67874 0.70605 "Another label" 0.95949 0.75774 0.031833 "Additional label" 0.65574 0.74313 0.27692 "Fourth label" 0.035712 0.39223 0.046171 "Fifth label" 0.84913 0.65548 0.097132 "Final label" 0.93399 0.17119 0.82346 end sxpose, clear force foreach v of varlist _var* { local name = `v'[1] lab var `v' "`name'" } This costs me the names of the category variables, which I need to preserve, as well as some accuracy. It yields data that look like this: _var1 _var2 _var3 _var4 _var5 _var6 Some label Another label Additional label Fourth label Fifth label Final label 0.792209983 0.959490001 0.655740023 0.035712 0.849129975 0.933990002 0.678740025 0.757740021 0.743130028 0.392230004 0.655480027 0.171189994 0.706049979 0.031833001 0.276919991 0.046170998 0.097131997 0.823459983 Any suggestions? I'm happy to continue with my Mata/xpose solution if it won't cost me data, but if there's a better solution I'll use it. John Bensin On Fri, Jun 21, 2013 at 8:34 AM, John Bensin <johnalexbensin@gmail.com> wrote: > I have several datasets that fit this general format (variable names > are in the first row): > > varlabels category diff_category category3 > "Some label" 0.79221 0.67874 0.70605 > "Another label" 0.95949 0.75774 0.031833 > "Additional label" 0.65574 0.74313 0.27692 > "Fourth label" 0.035712 0.39223 0.046171 > "Fifth label" 0.84913 0.65548 0.097132 > "Final label" 0.93399 0.17119 0.82346 > > My goal is to use Stata 12 to get them into this format: > > _var1 _var2 _var3 _var4 _var5 > _var6 cat > 0.792210 0.959490 0.655740 0.035712 0.849130 > 0.933990 "category" > 0.678740 0.757740 0.743130 0.392230 0.655480 > 0.171190 "diff category" > 0.706050 0.031833 0.276920 0.046171 0.097132 > 0.823460 "category3" > > where _var1 is labelled "Some label", _var2 is labelled "Another > label", etc. The datasets are > different and numerous enough that labelling the variables > individually isn't possible. The names > of the category variables aren't predictable either. > > My current solution is to improvise with Mata, xpose, and destring/tostring > > > clear > input str50 varlabels category diff_category category3 > "Some label" -18.2059 0.67874 105869.3 > "Another label" 0.95949 598.165 0.031833 > "Additional label" 0.65574 0.74313 0.27692 > "Fourth label" 194 0.39223 0.046171 > "Fifth label" 0.84913 394.069 -5968387 > "Final label" 0.93399 0.17119 0.82346 > end > > putmata varlabels, replace > > mata > varlabels = varlabels' > end > > xpose, clear varname > tostring v*, force usedisplayformat replace > getmata (v*)=varlabels, update force > > foreach v of varlist v* { > local name = `v'[1] > lab var `v' "`name'" > } > > drop if _varname == "varlabels" > destring, replace > > > Although this works, I question if there is a better way because I see > a few problems with this: > > 1. The combination of tostring/destring may cause a loss of > precision, according to Stata's > warnings. > > 2. The --getmata-- call forces the vector of labels into the > first observation, which seems clunky. > Is it safe to always rely on this working or do I risk > overwriting data? > > I also pursued a solution using the --sxpose-- command from the SSC: > > > clear > input str50 varlabels category diff_category category3 > "Some label" 0.79221 0.67874 0.70605 > "Another label" 0.95949 0.75774 0.031833 > "Additional label" 0.65574 0.74313 0.27692 > "Fourth label" 0.035712 0.39223 0.046171 > "Fifth label" 0.84913 0.65548 0.097132 > "Final label" 0.93399 0.17119 0.82346 > end > > sxpose, clear force > > foreach v of varlist _var* { > local name = `v'[1] > lab var `v' "`name'" > } > > > This costs me the names of the category variables, which I need to > preserve; it yields data that look like > this: > > _var1 _var2 _var3 _var4 > _var5 _var6 > Some label Another label Additional label Fourth > label Fifth label Final label > 0.792209983 0.959490001 0.655740023 0.035712 > 0.849129975 0.933990002 > 0.678740025 0.757740021 0.743130028 > 0.392230004 0.655480027 0.171189994 > 0.706049979 0.031833001 0.276919991 > 0.046170998 0.097131997 0.823459983 > > > Any suggestions? I'm happy to continue with my Mata/xpose solution if > it won't cost me data, but if there's a > better solution I'll use it. > > John Bensin > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?***From:*John Bensin <johnalexbensin@gmail.com>

- Prev by Date:
**Re: st: metareg of failure rates or proportions** - Next by Date:
**Re: st: How to drop observations according to date?** - Previous by thread:
**st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?** - Next by thread:
**Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?** - Index(es):