Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: expanding data set by variable


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: expanding data set by variable
Date   Wed, 9 May 2012 08:51:14 +0100

1. The help for -expand- looks clear enough to me, but that's not a
test of much. You could  write to StataCorp explaining what you find
unclear.

2. I think a structure in which each observation is a person-day (or a
person-day-activity) is going to make your calculations easiest, which
is why I suggest it.

3. What

bysort ID : replace mydate = mydate + _n - 1

does can  be worked out by what it does, but the observation number _n
increases 1 up within blocks of -ID- so _n - 1 increases 0 up: the
result is an increasing sequence of daily dates. On -by:- and _n
within -by:- see a tutorial at

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/02   SJ 2(1):86--102                                  (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N

Or see something similar within section 7 of

FAQ     . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        2/03    How can I replace missing values with previous or
                following nonmissing values?
                http://www.stata.com/support/faqs/data/missing.html

On Wed, May 9, 2012 at 1:18 AM, KOTa <[email protected]> wrote:
> thanks for quick response, Nick
>
>> 1. What the date variables are (string, numeric, numeric with a date format)?
>
> sorry i just didnt think to mention this, cause it can easily
> converted among those formats you listed. ( and i actually have dates
> in all 3 of them)
>
>> 2. Why you think the second data structure is going to be a good one?
>
> what i am trying to do is to count the time spent on each "type" of
> activity. which i already figured out how to do (with the help from
> the statalist )
> but the problem is if activities overlap in days for same person and i
> dont take account for this - i over-count them both.
> so what i tried to do is to split time equally among requests (ID)
> that happened at the same time(for same user). i managed to do this
> for requests (ID) that start at the same time, but could not find a
> way to do this if they start at different times (and it can be overlap
> between more then 2 requests). the aprouch i though to take is to
> recode the data so each observation would be split into not
> overlapping periods.
>
>> If this were my data, I would get a different structure this way:
>> ...
>> gen mydate = date(string(start, "%12.0f"), "YMD")
>> gen mydate2 = date(string(end, "%12.0f"), "YMD")
>> format mydate %td
>> gen length = mydate2 - mydate + 1
>
> that is how i started
>
>> expand length
>
> that what i wanted to do, but could not find in the help or examples
> if "expand" can be used this way
>
>> bysort ID : replace mydate = mydate + _n - 1
>
> 1. i forgot to mention that the count has to be by activity type. so,
> correct me if i wrong the bysort then should be "bysort ID type" ..."
> 2. i didn't understand the logic of replace mydate = mydate + _n - 1
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index