Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: data manipulation prob. |

Date |
Thu, 7 Jun 2012 18:41:45 +0100 |

I am going to guess that there is a panel structure too, hidden from this example. Consider bysort id (date) : gen sumhits = sum(hits) by id : egen when_halfway = min(date / (sumhits >= (sumhits[_N] / 2))) by id : gen time_halfway = when_halfway - date[1] For more on the trick in the second line, see SJ-11-2 dm0055 . . . . . . . . . . . . . . Speaking Stata: Compared with ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q2/11 SJ 11(2):305--314 (no commands) reviews techniques for relating values to values in other observations With no panel structure, this could be sort date gen sumhits = sum(hits) su date if sumhits >= (sumhits[_N] / 2) di r(min) - date[1] The underlying principle is tautological: the first date on which something is true is just the minimum date satisfying that condition. Nick n.j.cox@durham.ac.uk tashi lama You guessed that right. I could have selected my dataset little random. Yes, my dataset could be really random. I have an idea though, just can't think enough of stata to do it date hits |---------------------------| 1. | 10mar2011 01:07:18 2 | 2. | 10mar2011 01:09:48 3 | 3. | 10mar2011 01:54:00 1 | 4. | 10mar2011 02:03:37 8 | 5. | 10mar2011 02:11:00 9 | |---------------------------| 6. | 10mar2011 02:26:00 5 | 7. | 10mar2011 02:46:00 12 | 8. | 10mar2011 02:47:00 34 | 9. | 10mar2011 02:51:09 14 | 10. | 10mar2011 02:51:24 80 | +---------------------------+ gen runhits=sum(hits) list date hits runhits | |-------------------------------------| 1. | 10mar2011 01:07:18 2 2 | 2. | 10mar2011 01:09:48 3 5 | 3. | 10mar2011 01:54:00 1 6 | 4. | 10mar2011 02:03:37 8 14 | 5. | 10mar2011 02:11:00 9 23 | |-------------------------------------| 6. | 10mar2011 02:26:00 5 28 | 7. | 10mar2011 02:46:00 12 40 | 8. | 10mar2011 02:47:00 34 74 | 9. | 10mar2011 02:51:09 14 88 | 10. | 10mar2011 02:51:24 80 168 gen x=(runhits>ceil(runhits[_N]/2)) list date hits runhits x |-----------------------------------------| 1. | 10mar2011 01:07:18 2 2 0 | 2. | 10mar2011 01:09:48 3 5 0 | 3. | 10mar2011 01:54:00 1 6 0 | 4. | 10mar2011 02:03:37 8 14 0 | 5. | 10mar2011 02:11:00 9 23 0 | |-----------------------------------------| 6. | 10mar2011 02:26:00 5 28 0 | 7. | 10mar2011 02:46:00 12 40 0 | 8. | 10mar2011 02:47:00 34 74 0 | 9. | 10mar2011 02:51:09 14 88 1 | 10. | 10mar2011 02:51:24 80 168 1 | Now, I could do sth like di date[n]-date[1] where n=obs number when x=1 the first time although we could generate another variable "indicator" which will have only single "1". In any case, I need a mechanish to get an obs no when x=1. Hope this helps... Nick Cox > On the last question first: the usual Stata way is to add observations > at the end and then -sort-, although you could also -append- to a > one-observation dataset. > > If -hits- is always 1, then > > sort date > gen obs = _n > su obs, meanonly > di date[ceil(r(mean))] - date[1] > > I guess you will now tell us that the real data are more complicated. On Wed, Jun 6, 2012 at 10:24 PM, tashi lama <ltashi32@hotmail.com> wrote: > > date hits | > > |---------------------------| > > 1. | 10mar2011 01:07:18 1 | > > 2. | 10mar2011 01:09:48 1 | > > 3. | 10mar2011 01:54:00 1 | > > 4. | 10mar2011 02:03:37 1 | > > 5. | 10mar2011 02:11:00 1 | > > |---------------------------| > > 6. | 10mar2011 02:26:00 1 | > > 7. | 10mar2011 02:46:00 1 | > > 8. | 10mar2011 02:47:00 1 | > > 9. | 10mar2011 02:51:09 1 | > > 10. | 10mar2011 02:51:24 1 | > > > > I need to find the time taken to get half of the total hits > > > > summ hits > > > > gen runsum=sum(hits) > > > > date hits x | > > |---------------------------------| > > 1. | 10mar2011 01:07:18 1 1 | > > 2. | 10mar2011 01:09:48 1 2 | > > 3. | 10mar2011 01:54:00 1 3 | > > 4. | 10mar2011 02:03:37 1 4 | > > 5. | 10mar2011 02:11:00 1 5 | > > |---------------------------------| > > 6. | 10mar2011 02:26:00 1 6 | > > 7. | 10mar2011 02:46:00 1 7 | > > 8. | 10mar2011 02:47:00 1 8 | > > 9. | 10mar2011 02:51:09 1 9 | > > 10. | 10mar2011 02:51:24 1 10 | > > > > Now, the prob I am having is I will be comparing r(sum) in var "x" but I need to compute in var "date". So, if r(sum)/2 is 5 then i know to subtract date[5]-date[1]. Any idea? > > > > Also, is it possible to add one date observation on top in date column programmatically. So, I need to add 07mar2011 03:00:00 in date column and because this date comes first than other obs in the dataset, I need to make this as my first obs. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: data manipulation prob.***From:*tashi lama <ltashi32@hotmail.com>

**References**:**st: data manipulation prob.***From:*tashi lama <ltashi32@hotmail.com>

**Re: st: data manipulation prob.***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: data manipulation prob.***From:*tashi lama <ltashi32@hotmail.com>

- Prev by Date:
**st: RE: RE: Re: Loglinear quasi-symmetric agreement** - Next by Date:
**st: RE: using information from value label to generate new variables** - Previous by thread:
**RE: st: data manipulation prob.** - Next by thread:
**RE: st: data manipulation prob.** - Index(es):