[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Roger Newson <roger.newson@kcl.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Copy string variable as value label |

Date |
Mon, 25 Apr 2005 16:51:39 +0100 |

-sencode- can indeed solve Friedrich's problem (using the -gsort()- option to encode in an arbitrary order). The current version of -sencode- (downloadable from SSC) uses file manipulation only in 2 places:

1. There is an initial -preserve- and a final -restore, not-, in case the user presses -Break- in the middle of executing -sencode-.

2. In order for -sencode- to work if the -label()- option is given as an existing label, -sencode- uses -label save- to save the existing label to a temporary file, and then uses -file- to read that temporary file and find the highest integer with an existing label, so that any additional string values encoded are allocated integers even higher. I couldn't find a better way, at least in Stata 7 or 8, to obtain the highest labelled integer for an existing label.

Roger

At 15:46 25/04/2005, Nick Cox wrote (in reply to Friedrich Huebler):

This problem, or at least a relative of it, can be attacked, I think, using Roger Newson's -sencode-. His solution includes a certain amount of file manipulation. In my version of the problem when I looked at it two years ago I didn't find any need for that, but I haven't looked closely enough to work out what aspects of the problem Roger solves that I don't or indeed vice versa. There doesn't seem to be a help file for my resulting program, but the code is a bit more general than yours. program seqencode, sortpreserve *! NJC 1.0.0 1 May 2003 version 8 syntax varname(string) [if] [in], Generate(str) [ Label(str) Unique ] local limit = cond(c(flavor) == "Small", 1000, 65536) quietly { marksample touse, strok count if `touse' if r(N) == 0 error 2000 // variable is new? confirm new variable `generate' // label is new? if "`label'" == "" local label "`generate'" capture label list `label' if _rc != 111 { di as err "label `label' already defined" exit 110 } if "`unique'" != "" { // each value `touse' mapped to its own -label- replace `touse' = -`touse' sort `touse' `_sortindex' // define labels count if `touse' if `r(N)' > `limit' error 134 forval i = 1 / `r(N)' { label def `label' `i' /// `"`= `varlist'[`i']'"', modify } gen long `generate' = _n if `touse' } else { // get first occurrences tempvar first bysort `touse' `varlist' (`_sortindex') : /// gen byte `first' = -(_n == 1 & `touse') sort `first' `_sortindex' // define labels count if `first' if `r(N)' > `limit' error 134 forval i = 1 / `r(N)' { label def `label' `i' /// `"`= `varlist'[`i']'"', modify } // copy values from first occurrences gen long `generate' = _n if `touse' bysort `touse' `varlist' (`generate'): /// replace `generate' = `generate'[1] } compress `generate' // assign labels label val `generate' `label' label var `generate' `"`: variable label `varlist''"' } end Nick n.j.cox@durham.ac.uk Friedrich Huebler > When a string variable is converted to a numeric variable with > -encode-, the numeric values follow the sort order of the string > variable. I would like to -encode- a string variable based on the > sort order of another variable. My original data is like this: > > var mean > a 1.5 > b 1.2 > b 1.2 > b 1.2 > c 1.8 > c 1.8 > > I would like to create the variable "newvar" like this, using the > sort order of the variable "mean": > > var mean newvar (label for newvar) > b 1.2 1 b > b 1.2 1 b > b 1.2 1 b > a 1.5 2 a > c 1.8 3 c > c 1.8 3 c > > My solution is shown below. Creating "newvar" itself is simple but > there must be a better way to assign the labels. > > sort mean > egen newvar = group(mean) > lab def newvar 1 "temp" > levels(newvar), local(levels) > foreach l of local levels { > gen temp = "" > replace temp = var if newvar==`l' > levels(temp), local(templabel) > lab def newvar `l' `templabel', modify > drop temp > } > lab val newvar newvar > > How can this code be improved? Thank you for your suggestions. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

-- Roger Newson Lecturer in Medical Statistics Department of Public Health Sciences Division of Asthma, Allergy and Lung Biology King's College London 5th Floor, Capital House 42 Weston Street London SE1 3QD United Kingdom Tel: 020 7848 6648 International +44 20 7848 6648 Fax: 020 7848 6620 International +44 20 7848 6620 or 020 7848 6605 International +44 20 7848 6605 Email: roger.newson@kcl.ac.uk Website: http://phs.kcl.ac.uk/rogernewson/ Opinions expressed are those of the author, not the institution. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Re: Copy string variable as value label***From:*Friedrich Huebler <huebler@rocketmail.com>

**References**:**st: RE: Copy string variable as value label***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: Stata 9** - Next by Date:
**RE: st: RE: bootstrap coefficient standard errors** - Previous by thread:
**st: RE: Copy string variable as value label** - Next by thread:
**st: Re: Copy string variable as value label** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |