Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: strings hnadling


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: strings hnadling
Date   Wed, 8 Feb 2006 10:55:28 -0000

gen ngrp1 = wordcount(subinstr(grp1, ";", " ", .))

etc. 

Nick 
[email protected] 

Philip Ryan
 
> Another possible solution makes use of the -noccur()- 
> function written by 
> Nick Winter for the -egenmore- set of user -written add-ons 
> to -egen- (see 
> findit egenmore).
> 
> Here I assume you want the number of elements in each 
> variable, not the 
> number of separators (semi-colons). In general the number of 
> elements will 
> be one more than the number of separators:
> 
> forvalues i = 1/4 {
> egen k`i' = noccur(grp`i'), string(";")
> replace k`i' = k`i' + 1
> }
> 
> You would need to be careful that the separators are indeed 
> all semi-colons 
> (and not a mixture of semi-colons and commas such as you show in your 
> message) and that there were no additional semi-colons - 
> doubling up or 
> trailing ones, for example.
 
Scott Merryman

> >In your example, isn't the number of semicolons: 2, 2, 0, 0 ?
> >
> >Or, do you mean something like this?
> >
> >forv i = 1/4 {
> >    qui gen gr`i' = .
> >}
> >levelsof id, local(levels)
> >foreach l of loca levels {
> >    local i = 1
> >    foreach v of varlist grp* {
> >      qui split `v' if id == `l', p(;) gen(_split)
> >      qui replace gr`i' = `=r(nvars)' if id == `l'
> >      drop _split*
> >      local ++i
> >    }
> >}
> >
> >
> >For example:
> >
> >
> >. l, noobs
> >
> >   +----------------------------------------------+
> >   | id    grp1         grp2   grp3          grp4 |
> >   |----------------------------------------------|
> >   |  1   2;3;4      10;99;2     01   11;2;25;2;3 |
> >   |  2     2;3   10;99;2;44     01     11;2;25;2 |
> >   +----------------------------------------------+
> >
> >. forv i = 1/4 {
> >   2. qui gen gr`i' = .
> >   3. }
> >
> >. levelsof id, local(levels)
> >1 2
> >
> >. foreach l of loca levels {
> >   2. local i = 1
> >   3. foreach v of varlist grp* {
> >   4.         qui split `v' if id == `l', p(;) gen(_split)
> >   5.         qui replace gr`i' = `=r(nvars)' if id == `l'
> >   6.         drop _split*
> >   7.         local ++i
> >   8.         }
> >   9.         }
> >
> >. l,noobs
> >
> >   
> +-------------------------------------------------------------
> ---------+
> >   | id    grp1         grp2   grp3          grp4   gr1   
> gr2   gr3   gr4 |
> >   
> |-------------------------------------------------------------
> ---------|
> >   |  1   2;3;4      10;99;2     01   11;2;25;2;3     3     
> 3     1     5 |
> >   |  2     2;3   10;99;2;44     01     11;2;25;2     2     
> 4     1     4 |
> >   
> +-------------------------------------------------------------
> ---------+

Alexander Nervedi

> > > I have data which has been entered awkwardly.
> > >
> > > Instead of taking each a seperate variable for each item 
> - all items of a
> > > category are entered together in a variable.
> > >
> > > ID      Grp1     Grp2       Grp3   Grp4
> > > 001   2;3;4    10;99;2    01     11,2,25,2,3
> > >
> > >
> > > I'd like to convert this to a dataset that looks like
> > >
> > > ID      Grp1     Grp2       Grp3   Grp4
> > > 001    3          3            1        5
> > >
> > > i.e. the count of the number of semi-colons within each 
> variable.  I am
> > > sure
> > > there is a neat way of doing this but I am missing it. So 
> i thought i'd
> > > write in and ask for u r help.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index