A fairly crude hack follows. I do this two ways to check that there is
no inconsistency. Then there is a slightly simpler solution below:
/* Begin */
clear
input str10 date
197104
197504
196504
196904
8804
8404
6304
8304
end
g date1=trim(date)
g newvar=substr(date1, 3,.) if length(date1)>4
replace newvar=date1 if length(date1)<5
// or:
g date2 = "19"
ssc install catenate, replace
g str10 newvar2 = date2+date1 if length(date1)<5
replace newvar2=date1 if length(date1)>4
drop date1 date2
ssc install todate, replace
todate newvar2, gen(edu1st_1) pattern(yyyymm)
todate newvar, gen(edu1st_2) pattern(yymm) cend(2000)
assert edu1st_1==edu1st_2
drop newvar*
list edu1st*
/* End */
A simpler option could be to run NJ Cox's -todate- twice and then
combine the two output variables - some manipulation using -length()-
will be required in this case as well.
/* Simple option */
clear
input str10 date
197104
197504
196504
196904
8804
8404
6304
8304
end
todate date if length(date)>4, gen(edu1st_1) pattern(yyyymm)
todate date if length(date)<5, gen(edu1st_2) pattern(yymm) cend(2000)
replace edu1st_1=edu1st_2 if(edu1st_1==.)
drop edu1st_2
rename edu1st_1 edu1st
list edu1st
/* End */
T
On Sun, May 3, 2009 at 2:47 PM, Ekaterina Hertog
<[email protected]> wrote:
> Dear all,
> I have got a variable containing the month and year an individual started his or her education. Only Stata thinks the values in this variable are numbers and I want to turn them into dates.
> If all the numbers followed the same pattern that will not be a problem.
>
> for example I could do it like this:
> tostring edu_start_date_1, gen(stredu1st)
> gen edu1st = date(stredu1st, "YM")
>
> My problem is that while most dates in my dataset come in the yyyymm pattern:
> e.g.
> +----------+
> | stredu~t |
> |----------|
> 1. | . |
> 2. | 197104 |
> 3. | 197504 |
> 4. | 196504 |
> 5. | 196904 |
> |----------|
>
> several contain only yymm
> e.g.
>
> +-----------+
> | edu_st~1 |
> |-----------|
> 12338. | 8804 |
> 13265. | 8404 |
> 13666. | 6304 |
> 13831. | 8304 |
> +-----------+
>
> So when I run
>
> gen edu1st = date(stredu1st, "YM")
>
> all the yymm values in stredu1st are turned into missing values in edu1st.
>
> I could of course edit the values containing only yymm into yyyymm pattern manually, but this feels imprecise and prone to error and I would like to automate the process if at all possible.
> Is there a way to make the date command recognise alternating patterns?
> I would be very grateful for any advice,
> Sincerely yours,
> Ekaterina
>
> --
> Ekaterina Hertog (nee Korobtseva)
> Nissan Institute of Japanese Studies
> 27 Winchester Road, Oxford
> OX2 6NA
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/