Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Stata analog to Mata's -strdup()- or better approach?

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: RE: Stata analog to Mata's -strdup()- or better approach?
Date	Fri, 11 Mar 2011 21:20:10 +0000

should be

 replace longest = `i' if strpos(estring, X)

as there is absolutely no need to evaluate -length(X)- (repeatedly)
when it is in fact known.


On Fri, Mar 11, 2011 at 6:51 PM, Nick Cox <[email protected]> wrote:
> There is at least one analogue, but I don't think you need it for this.
>
> gen X50 = `"`: di _dup(50) "X" '"'
>
> See -help extended fcn- and look for "display directive".
>
> My -egen- function (not option) -repeat()- isn't aimed at this problem.
>
> More interestingly, you could try something like this:
>
> gen X = ""
> gen longest = 0
> gen where = 0
>
> qui forval i = 1/180 {
>        replace X = X + "X"
>        replace longest = length(X) if strpos(estring, X)
>        replace where = strpos(estring, X) if strpos(estring, X)
> }
>
> -if strpos()- is just a contraction of -if strpos() > 0-.
>
> So, the pattern you search for is just so many "X"s. If you find a string of "X"s longer than you found previously, you update.
>
> Warning: Code not tested.
>
> I don't think this is all of your problem, but I don't think you need much, if any, machinery beyond this.
>
> Nick
> [email protected]
>
> Rebecca Pope
>
> Does anyone know if there is a Stata analog to Mata's -strdup()-? I'm
> not committed to the approach below, so if anyone knows of a better
> way to accomplish what I need I'm open to all suggestions. I apologize
> in advance for the length of this e-mail, but I've tried to ensure
> sufficient detail.
>
> By way of background, I have data on patients' eligiblity for health
> insurance benefits over a period of 15 years. The data is stored such
> that a "-" is in a position of the string for a month that the patient
> was not eligible and an "X" if they were. If a patient was eligible in
> Jan of 1995, they have an "X" in position one. Position 13 corresponds
> to Jan 1996, etc. Therefore, the data looks something like the
> following for a period of 1 year. Note, all 15 years are stored in the
> same variable (estring), but I've truncated it for illustration
> purposes.
>
> patid     estring
> 1          XXXXX-------
> 2          --XXX---XXXX
> 3          -XXXXXX-----
> 4          -XXX-XXX-XXX
>
> I need to find first the longest period of continuous eligibility
> (i.e. the longest set of Xs) and when that period occurred.
>
> I've found the longest period of continuous eligibility by the following:
> /* begin code */
> tempvar wc elig
>
> generate `elig' = trim(itrim(subinstr(estring,"-"," ",.)))
> generate int `wc' = wordcount(`elig')
> quietly summarize `wc'
> local wmax = r(max)
> di `wmax'
>
> generate eligstr = word(`elig',1)
> compress
>
> forvalues i = 2/`wmax' {
>       replace eligstr = word(`elig',`i') ///
>               if length(word(`elig',`i')) > length(eligstr)
> }
>
> /* end code */
>
> I then go back and find when that occurs by the following:
> - generate int estart1 = strpos(estring,eligstr) -
>
> In general, this is sufficient, however for patients like patid==4
> above, I wouldn't know about other instances of the same eligibility
> length. I would like to generate additional variables estart2 through
> estart`wmax' that contain the starting positions of all other sets of
> Xs that match eligstr.
>
> I thought about replacing the first set of Xs with some non-X character using
> - subinstr() - but the problem is that I need to preserve the position
> and the number of Xs can vary, so I couldn't code something like
> - subinstr(estring,eligstr,"---",1) -.
> In my mind, the solution to this would be something like the following:
> - subinstr(estring,eligstr,repeat("-",length(eligstr)),1) -
> such that Stata would generate the appropriate number of Xs to be
> replaced, thereby maintaining the position of the next set of Xs.
> However, -repeat- as used above is not a Stata function as far as I
> can tell. There is a -repeat- option in Nick Cox's -egenmore- package,
> but as near as I can tell it won't work for my purposes. The closest
> thing I've found is a Mata function -strdup()- or more precisely the
> ability to code "-"*n where n would
> have to be defined previously as the length of eligstr.
>
> I'm willing to work out how to write the Mata code, but I thought that
> first I'd check with the List to see if there was a relatively simple
> solution like some sort of repeat function.
>
> I am using Stata 11/MP.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: Stata analog to Mata's -strdup()- or better approach?
  - From: Rebecca Pope <[email protected]>

References:
- st: Stata analog to Mata's -strdup()- or better approach?
  - From: Rebecca Pope <[email protected]>
- st: RE: Stata analog to Mata's -strdup()- or better approach?
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: CSV read with limits
Next by Date: Re: st: RE: RE: Controlling for associated observations
Previous by thread: st: RE: Stata analog to Mata's -strdup()- or better approach?
Next by thread: Re: st: RE: Stata analog to Mata's -strdup()- or better approach?
Index(es):
- Date
- Thread