# RE: st: data manipulation prob.

 From Nick Cox <[email protected]> To "'[email protected]'" <[email protected]> Subject RE: st: data manipulation prob. Date Thu, 7 Jun 2012 18:41:45 +0100

```I am going to guess that there is a panel structure too, hidden from this example. Consider

bysort id (date) : gen sumhits = sum(hits)
by id : egen when_halfway = min(date / (sumhits >= (sumhits[_N] / 2)))
by id : gen time_halfway = when_halfway - date[1]

For more on the trick in the second line, see

SJ-11-2 dm0055  . . . . . . . . . . . . . .  Speaking Stata: Compared with ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
Q2/11   SJ 11(2):305--314                                (no commands)
reviews techniques for relating values to values in other
observations

With no panel structure, this could be

sort date
gen sumhits = sum(hits)
su date if sumhits >= (sumhits[_N] / 2)
di r(min) - date[1]

The underlying principle is tautological: the first date on which something is true is just the minimum date satisfying that condition.

Nick
[email protected]

tashi lama

You guessed that right. I could have selected my dataset little random. Yes, my dataset could be really random. I have an idea though, just can't think enough of stata to do it

date                   hits
|---------------------------|
1. | 10mar2011 01:07:18      2 |
2. | 10mar2011 01:09:48      3 |
3. | 10mar2011 01:54:00      1 |
4. | 10mar2011 02:03:37      8 |
5. | 10mar2011 02:11:00      9 |
|---------------------------|
6. | 10mar2011 02:26:00      5 |
7. | 10mar2011 02:46:00     12 |
8. | 10mar2011 02:47:00     34 |
9. | 10mar2011 02:51:09     14 |
10. | 10mar2011 02:51:24     80 |
+---------------------------+

gen runhits=sum(hits)

list

date            hits   runhits |
|-------------------------------------|
1. | 10mar2011 01:07:18      2         2 |
2. | 10mar2011 01:09:48      3         5 |
3. | 10mar2011 01:54:00      1         6 |
4. | 10mar2011 02:03:37      8        14 |
5. | 10mar2011 02:11:00      9        23 |
|-------------------------------------|
6. | 10mar2011 02:26:00      5        28 |
7. | 10mar2011 02:46:00     12        40 |
8. | 10mar2011 02:47:00     34        74 |
9. | 10mar2011 02:51:09     14        88 |
10. | 10mar2011 02:51:24     80       168

gen x=(runhits>ceil(runhits[_N]/2))

list

date   hits   runhits           x
|-----------------------------------------|
1. | 10mar2011 01:07:18      2         2   0 |
2. | 10mar2011 01:09:48      3         5   0 |
3. | 10mar2011 01:54:00      1         6   0 |
4. | 10mar2011 02:03:37      8        14   0 |
5. | 10mar2011 02:11:00      9        23   0 |
|-----------------------------------------|
6. | 10mar2011 02:26:00      5        28   0 |
7. | 10mar2011 02:46:00     12        40   0 |
8. | 10mar2011 02:47:00     34        74   0 |
9. | 10mar2011 02:51:09     14        88   1 |
10. | 10mar2011 02:51:24     80       168   1 |

Now, I could do sth like

di date[n]-date[1] where n=obs number when x=1 the first time although we could generate another variable  "indicator" which will have only single "1". In any case, I need a mechanish to get an obs no when x=1. Hope this helps...

```