Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: problem using -clock- with military time
Steve Nakoneshny <email@example.com>
Re: st: problem using -clock- with military time
Sat, 2 Jun 2012 13:28:59 -0600
Thanks for the insight, Nick. After my initial post, I started to explore -subinstr-, but then ran into a (temporary) roadblock with how to make it selectively replace only those 3-digit times and not for all values.
It's always nice to see multiple potential solutions, especially as well explained as this one.
Sent via carrier pigeon
On Jun 2, 2012, at 2:31 AM, "Nick Cox" <firstname.lastname@example.org> wrote:
> There are many other ways of tackling this problem. Here are a few
> more comments. Others should be able to suggest yet more.
> The question was posed as one of inserting a "0" after the space
> whenever the second part of the date is too short, i.e. three digits
> not four.
> That means we should focus on identifying the space and inserting the
> "0", which in Stata just means changing " " to " 0", as there isn't an
> "insert in string" function. (There isn't a "delete from string"
> function, either: both can be just special cases of -subinstr()-.)
> The assumption is that there should be precisely one space.
> replace Arrive = trim(itrim(Arrive))
> does our best to make that so. -trim()- removes any leading or
> trailing spaces, while -itrim()- reduces all multiple internal spaces
> to single spaces. That -itrim()- didn't appear in the previous
> posting. I feel comfortable with making any such changes as they can't
> affect the meaning of a date string. Those concerned with absolute
> data integrity should work with a copy of the original variable.
> We should check that there is precisely one space. After what we have
> just done, and in any case, that would mean that there are precisely
> two words. In Stata, words are whatever are separated by spaces
> (except that " " and `" "' bind tighter than spaces separate), so
> "frog toad" are two words, and so are "123 456" and "2011/04/06 1630".
> Stata has a -wordcount()- function, so we can go
> assert wordcount(Arrived) == 2
> asserts that that is so, and you will get an error message if it
> isn't. (The principle is, very much, "No news is good news", but if
> there is bad news, there are fixes needed.) Many Stata beginners would
> do here something like this
> gen nwords = wordcount(Arrived)
> tab nwords
> but for problems like this you don't need a new variable and you can
> insist Stata does the checking. (Conversely, there are more open-ended
> problems in which looking at the patterns shown by the table is
> exactly the right thing to do.) As there can be only two words
> replace Arrived = subinstr(Arrived, " ", " 0", 1) if
> length(word(Arrived, 2)) == 3
> is an alternative to what was posted previously.
> Another way to think about it is that it appears that there are two
> kinds of date, long and short, so we could work with
> -length(Arrived)-, which should be 15 or 14. For problems like this, I
> tend to copy and paste examples and feed them to -display-, as in
> . di length("2011/04/06 1630")
> because Stata is better at counting than I am. So -if length(Arrived)
> == 14- identifies short dates that need fixing.
> On Sat, Jun 2, 2012 at 12:00 AM, Nick Cox <email@example.com> wrote:
>> input str15 ArrivedOnPCU
>> "2011/04/06 1630"
>> "2010/07/18 700"
>> "2011/09/06 400"
>> "2011/06/23 130"
>> replace Arrived = trim(Arrived)
>> replace Arrived = subinstr(Arrived, " ", " 0", 1) if
>> length(word(Arrived, -1)) == 3
>> This example boosts my prejudice that few parts of Stata are so
>> unfairly overlooked as the basic string functions. See also
>> Cox, N.J. 2011. Speaking Stata: Fun and fluency with functions. The
>> Stata Journal 11(3): 460-471
>> Abstract. Functions are the unsung heroes of Stata. This column is a
>> tour of functions that might easily be missed or underestimated, with
>> a potpourri of tips, tricks, and examples for a wide range of basic
>> for a review.
>> On Fri, Jun 1, 2012 at 11:39 PM, Steve Nakoneshny <firstname.lastname@example.org> wrote:
>>> I have been provided with a dataset containing date and time variables in string format. I wish to convert these to SIF type using the -clock- function, however I have run into a small problem given that the times are formatted as military time (sadly without the leading zero). The code -gen double pcutime = clock(ArrivedOnPCU, "YMDhm")- executes imperfectly.
>>> After formatting pcutime to %tc, I can see that some of the times translate imperfectly:
>>> ArrivedOnPCU pcutime
>>> 2011/04/06 1630 06apr2011 16:30:00
>>> 2010/07/18 700 .
>>> 2011/09/06 400 .
>>> 2011/06/23 130 23jun2011 13:00:00
>>> If I manually edit the second obs to read as "2010/07/18 0700" and -replace pcutime = clock(ArrivedOnPCU, "YMDhm"), pcutime displays 18jul2010 07:00:00. It is pretty obvious to me that I'm choosing the wrong mask in the clock function to fail to account for both the missing values in pcutime as well as the incorrect times (i.e. 0130 translating to 13:00).
>>> I've tried a various permutations of hm/HM/HHMM/hhmm to try to adjust, but to no avail. Can anybody suggest a better mask for me to use? Or perhaps some relatively simple means of inserting a leading "0" into the time portion of the string prior to using -clock-?
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: