Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: data manipulation prob.


From   tashi lama <[email protected]>
To   <[email protected]>
Subject   RE: st: data manipulation prob.
Date   Thu, 7 Jun 2012 18:15:21 +0000

Awesome....thanx ton.
----------------------------------------
> From: [email protected]
> To: [email protected]
> Date: Thu, 7 Jun 2012 18:41:45 +0100
> Subject: RE: st: data manipulation prob.
>
> I am going to guess that there is a panel structure too, hidden from this example. Consider
>
> bysort id (date) : gen sumhits = sum(hits)
> by id : egen when_halfway = min(date / (sumhits >= (sumhits[_N] / 2)))
> by id : gen time_halfway = when_halfway - date[1]
>
> For more on the trick in the second line, see
>
> SJ-11-2 dm0055 . . . . . . . . . . . . . . Speaking Stata: Compared with ...
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q2/11 SJ 11(2):305--314 (no commands)
> reviews techniques for relating values to values in other
> observations
>
> With no panel structure, this could be
>
> sort date
> gen sumhits = sum(hits)
> su date if sumhits >= (sumhits[_N] / 2)
> di r(min) - date[1]
>
> The underlying principle is tautological: the first date on which something is true is just the minimum date satisfying that condition.
>
> Nick
> [email protected]
>
> tashi lama
>
> You guessed that right. I could have selected my dataset little random. Yes, my dataset could be really random. I have an idea though, just can't think enough of stata to do it
>
>
>
> date hits
> |---------------------------|
> 1. | 10mar2011 01:07:18 2 |
> 2. | 10mar2011 01:09:48 3 |
> 3. | 10mar2011 01:54:00 1 |
> 4. | 10mar2011 02:03:37 8 |
> 5. | 10mar2011 02:11:00 9 |
> |---------------------------|
> 6. | 10mar2011 02:26:00 5 |
> 7. | 10mar2011 02:46:00 12 |
> 8. | 10mar2011 02:47:00 34 |
> 9. | 10mar2011 02:51:09 14 |
> 10. | 10mar2011 02:51:24 80 |
> +---------------------------+
>
>
> gen runhits=sum(hits)
>
> list
>
> date hits runhits |
> |-------------------------------------|
> 1. | 10mar2011 01:07:18 2 2 |
> 2. | 10mar2011 01:09:48 3 5 |
> 3. | 10mar2011 01:54:00 1 6 |
> 4. | 10mar2011 02:03:37 8 14 |
> 5. | 10mar2011 02:11:00 9 23 |
> |-------------------------------------|
> 6. | 10mar2011 02:26:00 5 28 |
> 7. | 10mar2011 02:46:00 12 40 |
> 8. | 10mar2011 02:47:00 34 74 |
> 9. | 10mar2011 02:51:09 14 88 |
> 10. | 10mar2011 02:51:24 80 168
>
>
> gen x=(runhits>ceil(runhits[_N]/2))
>
> list
>
> date hits runhits x
> |-----------------------------------------|
> 1. | 10mar2011 01:07:18 2 2 0 |
> 2. | 10mar2011 01:09:48 3 5 0 |
> 3. | 10mar2011 01:54:00 1 6 0 |
> 4. | 10mar2011 02:03:37 8 14 0 |
> 5. | 10mar2011 02:11:00 9 23 0 |
> |-----------------------------------------|
> 6. | 10mar2011 02:26:00 5 28 0 |
> 7. | 10mar2011 02:46:00 12 40 0 |
> 8. | 10mar2011 02:47:00 34 74 0 |
> 9. | 10mar2011 02:51:09 14 88 1 |
> 10. | 10mar2011 02:51:24 80 168 1 |
>
>
> Now, I could do sth like
>
> di date[n]-date[1] where n=obs number when x=1 the first time although we could generate another variable "indicator" which will have only single "1". In any case, I need a mechanish to get an obs no when x=1. Hope this helps...
>
> Nick Cox
>
> > On the last question first: the usual Stata way is to add observations
> > at the end and then -sort-, although you could also -append- to a
> > one-observation dataset.
> >
> > If -hits- is always 1, then
> >
> > sort date
> > gen obs = _n
> > su obs, meanonly
> > di date[ceil(r(mean))] - date[1]
> >
> > I guess you will now tell us that the real data are more complicated.
>
> On Wed, Jun 6, 2012 at 10:24 PM, tashi lama <[email protected]> wrote:
>
> > > date hits |
> > > |---------------------------|
> > > 1. | 10mar2011 01:07:18 1 |
> > > 2. | 10mar2011 01:09:48 1 |
> > > 3. | 10mar2011 01:54:00 1 |
> > > 4. | 10mar2011 02:03:37 1 |
> > > 5. | 10mar2011 02:11:00 1 |
> > > |---------------------------|
> > > 6. | 10mar2011 02:26:00 1 |
> > > 7. | 10mar2011 02:46:00 1 |
> > > 8. | 10mar2011 02:47:00 1 |
> > > 9. | 10mar2011 02:51:09 1 |
> > > 10. | 10mar2011 02:51:24 1 |
> > >
> > > I need to find the time taken to get half of the total hits
> > >
> > > summ hits
> > >
> > > gen runsum=sum(hits)
> > >
> > > date hits x |
> > > |---------------------------------|
> > > 1. | 10mar2011 01:07:18 1 1 |
> > > 2. | 10mar2011 01:09:48 1 2 |
> > > 3. | 10mar2011 01:54:00 1 3 |
> > > 4. | 10mar2011 02:03:37 1 4 |
> > > 5. | 10mar2011 02:11:00 1 5 |
> > > |---------------------------------|
> > > 6. | 10mar2011 02:26:00 1 6 |
> > > 7. | 10mar2011 02:46:00 1 7 |
> > > 8. | 10mar2011 02:47:00 1 8 |
> > > 9. | 10mar2011 02:51:09 1 9 |
> > > 10. | 10mar2011 02:51:24 1 10 |
> > >
> > > Now, the prob I am having is I will be comparing r(sum) in var "x" but I need to compute in var "date". So, if r(sum)/2 is 5 then i know to subtract date[5]-date[1]. Any idea?
> > >
> > > Also, is it possible to add one date observation on top in date column programmatically. So, I need to add 07mar2011 03:00:00 in date column and because this date comes first than other obs in the dataset, I need to make this as my first obs.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/ 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index