Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: puzzling "missing" before and after "gen newvar = mdy(a, b, c)"


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: puzzling "missing" before and after "gen newvar = mdy(a, b, c)"
Date   Thu, 11 Jan 2007 23:03:29 -0000

No. You don't -tostring- first. That's why my code was it 
was. It is intended as a complete alternative to your code. 
You just need my two lines and none of your original. 

More slowly: 

1. -datedx- is numeric. 

2. We want to spit out observations in which the fifth and sixth 
or seventh and eighth digits of -datedx- are 99. 

3. One way to do that is to look at 
-substr(string(datedx), 5, 6)- and  
-substr(string(datedx), 7, 8)- and check for "99". 
That is, you do the conversion to string on the fly with 
the function -string()- and do not change the variable at all. 

-tostring- and -destring- are useful commands in their way, 
but you do not need to touch them here. You want to map 
a numeric variable to a numeric variable and you do not 
need to go 

numeric -> string -> numeric 

which is like going from San Francisco to Los Angeles
to San Francisco. -todate- does something similar inside --
I've forgotten what, but I don't need to remember either -- 
just so you don't need to. 

Nick 
[email protected] 

Michael McCulloch
 
> Very elegant! However, it misses the "99"'s. Example:
> 
> . tostring datedx, replace
>          datedx was double now str8
> . drop if substr(string(datedx),5,2) == "99" | 
> substr(string(datedx),7,2) 
> == "99"
>          type mismatch
> 
> 
> 
> 
> At 02:25 PM 1/11/2007, you wrote:
> >Alternative:
> >
> >drop if substr(string(datedx),5,2) == "99" | 
> substr(string(datedx),7,2) == 
> >"99"
> >todate datedx, pattern(yyyymmddd) format(%d) gen(datedx1)
> >
> >Nick
> >[email protected]
> >
> >Michael McCulloch
> >
> > > Thanks Nick; this had just occurred to me.
> > > I was able to ferret out where the out-of-range days & 
> months were:
> > >
> > > BEFORE
> > > . tostring datedx, replace
> > >          datedx was double now str8
> > > . generate str4 dxyr1= substr(datedx,1,4)
> > > . generate str2 dxmo1 = substr(datedx,5,6)
> > > . generate str2 dxda1 = substr(datedx,7,8)
> > > . destring dx*, replace
> > >          dxyr1 has all characters numeric; replaced as int
> > >          dxmo1 has all characters numeric; replaced as byte
> > >          dxda1 has all characters numeric; replaced as byte
> > > . gen datedx1 = mdy(dxmo1, dxda1, dxyr1)
> > > (667 missing values generated)
> > >
> > > AFTER
> > > . tostring datedx, replace
> > >          datedx was double now str8
> > > . generate str4 dxyr1= substr(datedx,1,4)
> > > . generate str2 dxmo1 = substr(datedx,5,6)
> > > . generate str2 dxda1 = substr(datedx,7,8)
> > > . destring dx*, replace
> > > . drop if dxmo==99
> > >          (233 observations deleted)
> > > . drop if dxda==99
> > >          (434 observations deleted)
> > > . gen datedx1 = mdy(dxmo1, dxda1, dxyr1)
> > > *** no more missing; thanks for your help!***
> >
> > > At 02:05 PM 1/11/2007, you wrote:
> > > >I am glad you are making progress, but if you have any "99"
> > > >for month and day, i.e. missing values encoded as such,
> > > >then -todate- has no way to work out the correct values.
> > > >So, your dates should still be missing in the same place.
> > > >It's just that -todate- doesn't squawk at you about them. At
> > > this point,
> > > >I am not sure that is a feature.
> > > >
> > > >Nick
> > > >[email protected]
> > > >
> > > >Michael McCulloch
> > > >
> > > > > Thanks Nick; using -todate-, now the missing values are no
> > > > > longer generated.
> > > > >
> > > > > . todate datedx , gen(datedx2) pattern(yyyymmdd)
> > > > > . format datedx2 %d
> > > > >
> > > > > Out of curiosity, I looked back at my previous method, and
> > > > > found that there
> > > > > were day and month ==99.
> > > > >
> > > > > . summarize dx??1
> > > > >      Variable |       Obs        Mean    Std. Dev.       Min
> > > > >       Max
> > > > >
> > > 
> -------------+--------------------------------------------------------
> > > > >         dxyr1 |     26806    1990.195    1.958295       1987
> > > > >      1993
> > > > >         dxmo1 |     26806    7.269828    9.244575          1
> > > > >        99
> > > > >         dxda1 |     26806    17.66037    15.60529          1
> > > > >        99
> > > > >
> > > > > >summarize dx??1
> > > > > >
> > > > > >to see which of the days, months, years variables are
> > > > > >out of range.
> > > > > >
> > > > > >Anyway, why are you doing it this way? Only a few days ago
> > > > > >a thread you started thrashed out the fact that 
> -todate- from SSC
> > > > > >should be able to do it in one line.
> > > > > >
> > > > > >todate datedx, pattern(yyyymmdd) format(%d) gen(datedx1)
> > > > > >Michael McCulloch
> > > > > >
> > > > > > > I've converted a string date (yyyymmdd) to Stata
> > > format, using:
> > > > > > >
> > > > > > > . * convert datedx to Stata format
> > > > > > > . tostring datedx, replace
> > > > > > >          datedx was double now str8
> > > > > > > . generate str4 dxyr1= substr(datedx,1,4)
> > > > > > > . generate str2 dxmo1 = substr(datedx,5,6)
> > > > > > > . generate str2 dxda1 = substr(datedx,7,8)
> > > > > > > . destring dx*, replace
> > > > > > >          dxyr1 has all characters numeric; replaced as int
> > > > > > >          dxmo1 has all characters numeric; 
> replaced as byte
> > > > > > >          dxda1 has all characters numeric; 
> replaced as byte
> > > > > > > . gen datedx1 = mdy(dxmo1, dxda1, dxyr1)
> > > > > > >          (387 missing values generated)
> > > > > > > . format datedx1 %d
> > > > > > >
> > > > > > > However, search for missing values before & after 
> my commands
> > > > > > > yields nothing:
> > > > > > > . list datedx if missing(datedx) in 1/10
> > > > > > >
> > > > > > > Where might the missing 387 values have originated?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index