Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: How to define shortest possible period with 95% of observations


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: RE: How to define shortest possible period with 95% of observations
Date   Wed, 12 May 2010 15:33:41 -0400

What you are asking is not only contrary to Nick's recommendation
(and, now, mine), it is a mistake. You are assuming that the peak day
for fires occurs exactly in the middle of a "fire year".  Of course
the  peak days will differ from year to year, and you will wind up
with overlapping periods.

Steve



On Wed, May 12, 2010 at 3:13 PM, Nick Cox <[email protected]> wrote:
> What you outline is not what I was recommending. It's an awkward
> half-way house.
>
> But in terms of your new question, see
>
> SJ-6-4  dm0025  . . . . . . . . . .  Stata tip 36: Which observations?
> Erratum
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
> J. Cox
>        Q4/06   SJ 6(4):596                              (no commands)
>        correction of example code for Stata tip 36
>
> SJ-6-3  dm0025  . . . . . . . . . . . . . .  Stata tip 36: Which
> observations?
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
> J. Cox
>        Q3/06   SJ 6(3):430--432                                 (no
> commands)
>        tip for identifying which observations satisfy some
>        specified condition
>
> Nick
> [email protected]
>
> Daniel Mueller
>
> Thanks.
>
> In the code below I did my best to think in fire years by defining the
> peak fire day (these are always unique) as the middle of the year. Then
> I'd like to place the maximum into a local macro (where I fail miserably
>
> in line 2 as Steve rightly pointed out):
>
> qui su no_fire_day
> loc yearstart = Day[r(max)] - 183
> loc yearend = `yearstart' + 365
>
> My simple question is how I can place the Day where no_fires_day =
> r(max) into a local macro?
>
> (I ignore leap years for the sake of simplicity..)
>
>
> Nick Cox wrote on 5/12/2010 10:44 PM:
>> Without looking at this in detail, it seems to me that you might
> benefit
>> from thinking in terms of fire years, rather than calendar years,
>> starting on some day other than January 1.  After all, all sorts of
>> different sciences, not to mention religions, have years that don't
>> coincide with conventional Western calendar years: fiscal years, water
>> years, academic years, etc., etc.
>>
>> Several pertinent -egen- functions are included in -egenmore- on SSC.
>>
>> In other words, define the time scale in terms of those fire years;
> then
>> Robert's code will probably not need any complicated adjustments.
>>
>> Nick
>> [email protected]
>>
>> Daniel Mueller
>>
>> Robert, this works like charm!!! Thanks a bunch for this neat code.
> Also
>>
>> thanks to Nick for pointing me to -shorth- which I will certainly
>> explore in more detail after having sipped through the extensive
>> reference list.
>>
>> Using Roberts code I can seamlessly loop over the nine years of data
> and
>>
>> generate the shortest fire season per year with 95% of obs. The
> results
>> suggested an additional complication.. For some subsets the shortest
>> possible period likely starts a couple of days before Jan 1st, at the
>> end of the preceding year.
>>
>> I tweaked Roberts code a little to loop over years and defined the
>> middle of a year as the peak fire day. The code runs through, yet sets
>> the start of the fire season for some subsets to Jan 1st, while my
>> educated guess is that it should be somewhere around mid to end of
>> December. Something went wrong, but I can't spot the glitch in the
> code
>> below. Can someone please help?
>>
>> Thanks a lot in advance and best regards,
>> Daniel
>>
>>
>> *** start
>> forv y = `yearfirst'/`yearlast' {
>>
>> * keep previous year
>>    if `y' != `yearfirst' {
>>     keep if Year == `y' | Year == (`y'-1)
>>    }
>>    bys Day: g no_fire_day = _N
>>    qui su no_fire_day
>>
>> * define year to start 183 days before peak fire day
>>    loc yearstart = Day[r(max)] - 183
>>    loc yearend = `yearstart' + 365
>>    keep if Day>  `yearstart'&  Day<  `yearend' // or with
> egen->rotate?
>>    bys Day: keep if _n == _N
>>    g nobs = _n
>>
>> * the target is a continuous run that includes 95% of all fires
>>    sum no_fire_day, meanonly
>>    scalar target = .95 * r(sum)
>>
>>    scalar shortlen = .
>>    gen arun = .
>>    gen bestrun = .
>>
>>    * at each pass, create a run that starts at nobs == `i'
>>    * and identify the nobs where the number of fires>= 95%
>>    local more 1
>>    local i = 0
>>    while `more' {
>>     local i = `i' + 1
>>     qui replace arun = sum(no_fire_day * (nobs>= `i'))
>>     sum nobs if arun>= target, meanonly
>>     if r(N) == 0 local more 0
>>     else if (Day[r(min)] - Day[`i'])<  shortlen {
>>      scalar shortlen = Day[r(min)] - Day[`i']
>>      qui replace bestrun = arun
>>      qui replace bestrun = . if nobs>  r(min) | nobs<  `i'
>>     }
>>    }
>>    qui drop if bestrun == .
>>    drop bestrun arun
>>    save fires_`y', replace
>> }
>> *** end
>>
>>
>>
>>
>>
>> Robert Picard wrote on 5/11/2010 3:28 AM:
>>> Here is how I would approach this problem. I would do each year
>>> separately; it could be done all at once but it would complicate the
>>> code unnecessarily. If the fire data is one observation per fire, I
>>> would -collapse- it to one observation per day. Each observation
> would
>>> contain the number of fires that day. The following code will
> identify
>>> the first instance of the shortest run of days that includes 95% of
>>> fires for the year.
>>>
>>> Note that the following code will work, even if there are days
> without
>>> fires (and thus no observation for that day).
>>>
>>> *--------------------------- begin example -----------------------
>>> version 11
>>>
>>> * daily fire counts; with some days without fires
>>> clear all
>>> set seed 123
>>> set obs 365
>>> gen day = _n
>>> drop if uniform()<   .1
>>> gen nobs = _n
>>> gen nfires = round(uniform() * 10)
>>>
>>> * the target is a continuous run that includes 95% of all fires
>>> sum nfires, meanonly
>>> scalar target = .95 * r(sum)
>>> dis target
>>>
>>> scalar shortlen = .
>>> gen arun = .
>>> gen bestrun = .
>>>
>>> * at each pass, create a run that starts at nobs == `i'
>>> * and identify the nobs where the number of fires>= 95%
>>> local more 1
>>> local i 0
>>> while `more' {
>>>      local i = `i' + 1
>>>      qui replace arun = sum(nfires * (nobs>=`i'))
>>>      sum nobs if arun>= target, meanonly
>>>      if r(N) == 0 local more 0
>>>      else if (day[r(min)] - day[`i'])<   shortlen {
>>>              scalar shortlen = day[r(min)] - day[`i']
>>>              qui replace bestrun = arun
>>>              qui replace bestrun = . if nobs>   r(min) | nobs<   `i'
>>>      }
>>> }
>>>
>>> *--------------------- end example --------------------------
>>>
>>>
>>> Hope this help,
>>>
>>> Robert
>>>
>>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<[email protected]>
>> wrote:
>>>> I don't think any trick is possible unless you know in advance the
>>>> precise distribution, e.g. that it is Gaussian, or exponential, or
>>>> whatever, which here is not the case.
>>>>
>>>> So, you need to look at all the possibilities from the interval
>> starting
>>>> at the minimum to the interval starting at the 5% point of the fire
>>>> number distribution in each year.
>>>>
>>>> However, this may all be achievable using -shorth- (SSC). Look at
> the
>>>> -proportion()- option, but you would need to -expand- first to get a
>>>> separate observation for each fire. If that's not practicable, look
>>>> inside the code of -shorth- to get ideas on how to proceed. Note
> that
>> no
>>>> looping is necessary: the whole problem will reduce to use of -by:-
>> and
>>>> subscripts.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>> Daniel Mueller
>>>>
>>>> I have a strongly unbalanced panel with 100,000 observations (=fire
>>>> occurrences per day) that contain between none (no fire) and 3,000
>> fires
>>>>
>>>> per day for 8 years. The fire events peak in March and April with
>> about
>>>> 85-90% of the yearly total.
>>>>
>>>> My question is how I can define the shortest possible continuous
>> period
>>>> of days for each year that contains 95% of all yearly fires. The
>> length
>>>> and width of the periods may slightly differ across the years due to
>>>> climate and other parameters.
>>>>
>>>> I am sure there is a neat trick in Stata for this, yet I have not
>>>> spotted it. Any suggestions would be appreciated.
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index