Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fwd: Converting weekly data of form yyyyww


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Fwd: Converting weekly data of form yyyyww
Date   Wed, 1 Aug 2012 17:14:02 -0500

I am now delivering on my intention to write more about this.

I suggest that the key to analysing this kind of weekly data is to
treat it as daily data with a period of 7 days. Stata's weekly dates
are no use here as it's clear that the definition of week in the data
allows 53 weeks in some years. Stata's idea of weeks does not
encompass that, as in Stata week 52 in each year is stretched to 8 or
9 days, depending on whether a year is a leap year.

Ruth doesn't spell out the rules used in her data source, but 53 weeks
in 2009 are consistent with there being 53 Thursdays in 2009. Whether
Thursday is the beginning or the end of the week is ambiguous, but
also immaterial to the code to follow.

The calculation pivots on finding the day of the week of January 1 in
each year from which it is a small step to finding the first Thursday
in each year and another small step to identifying all Thursdays.  I
include a trap for incorrect data. The nub of the code is the -dow()-
function which returns 4 for Thursdays.

I'd be delighted to hear of neater ways of approaching this, so long
as they don't involve -merge- (personal taste supervening there).
More seriously, any other solution that is as short or shorter, or as
clear or clearer, is naturally of interest.

. inp str06 dates

         dates
  1. 201002
  2. 200953
  3. 201102
  4. 200935
  5. end

. gen week = real(substr(dates, -2, 2))

. gen year = real(substr(dates, 1, 4))

. gen jan1 = dow(mdy(1,1,year))

. local wanted = 4

. gen weeklydates = cond(jan1 <= `wanted', 1 -  jan1 + `wanted', 8 -
jan1 + `wanted')

. replace weeklydates = weeklydates + (week - 1) * 7
(4 real changes made)

. replace weeklydates = . if weeklydates > doy(mdy(12, 31, year))
(0 real changes made)

. gen date = mdy(1,1, year) + weeklydates - 1

. format date %td

. tsset date, delta(7)
        time variable:  date, 27aug2009 to 13jan2011, but with gaps
                delta:  7 days

.
. l

     +----------------------------------------------------+
     |  dates   week   year   jan1   weekly~s        date |
     |----------------------------------------------------|
  1. | 200935     35   2009      4        239   27aug2009 |
  2. | 200953     53   2009      4        365   31dec2009 |
  3. | 201002      2   2010      5         14   14jan2010 |
  4. | 201102      2   2011      6         13   13jan2011 |
     +----------------------------------------------------+


On Wed, Jul 25, 2012 at 10:15 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> -encode- is indifferent to the numeric or other meaning of the strings it
> maps to numeric. That's a reason why -destring- is needed.
>
> On your main problem, it seems that most weekly data come with one of two
> definitions, that weeks start or finish on particular days of the week. This
> is the key to successful data processing of weekly data.  I wrote a Stata
> Journal Tip on week assumptions in 2010. Google for the reference or for
> Statalist posts that cite it. I intend to write more on this, but probably
> not until after the Stata Conference in San Diego starting tomorrow.

On 25 Jul 2012, at 15:31, Ruth Gilgenbach <rgrune@gmail.com> wrote:

>> Thanks for your quick reply.  This was helpful--I didn't realize that's
>> what -encode- does.
>>
>> I'm sure there's a more elegant solution, but something like:
>>
>> ******************
>> gen year = substr(dates,1,4)
>> destring year, replace
>> gen week = substr(dates,5,2)
>> destring week, replace
>> gen delivery =yw(year,week)
>> format %tw delivery
>> *******************
>> works, except for the week 53 problem.

On Wed, Jul 25, 2012 at 8:14 AM, Nick Cox <njcoxstata@gmail.com> wrote:

>>> Don't -encode-, -destring-,  or use -real()- directly, which avoids
>>> creation of new variables. For example, -encode- would map "2009",
>>> "2010",
>>> "2011" to 1,2,3, not at all what you want.
>>>
>>> Mapping anybody's else "weeks" onto Stata's weeks is still going to be a
>>> problem, however.

On 25 Jul 2012, at 13:55, Ruth Gilgenbach <rgrune@gmail.com> wrote:

>>>> I have a data set with variable "dates" which are of the form yyyyww,
>>>> and I am trying to convert them to a useable format, and am having no
>>>> luck. I am running Stata 12.
>>>>
>>>> For example, if I use the following
>>>> ************************
>>>> inp str06 dates
>>>> 201002
>>>> 200953
>>>> 201102
>>>> 200935
>>>> end
>>>>
>>>> gen delivery = weekly(dates,"YW")
>>>> *************************
>>>>
>>>> I get the result: (4 missing values generated) rather than the correct
>>>> result.
>>>>
>>>> I know that the 53-week year in 2009 is also going to pose a problem,
>>>> but this problem persists even in the absence of week 53.
>>>>
>>>> I have also attempted to split the string into components and build
>>>> the dates from these, but I receive the same problem, that missing
>>>> values are generated rather than the values themselves:
>>>>
>>>> ******************
>>>> inp str06 dates
>>>> 201002
>>>> 200953
>>>> 201102
>>>> 200935
>>>> end
>>>>
>>>> gen year =substr(dates,1,4)
>>>> encode year, generate(nyear)
>>>> gen week = substr(dates,5,2)
>>>> encode week, generate(nweek)
>>>>
>>>> gen delivery2 = yw(nyear,nweek)
>>>> *********************************
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index