Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# RE: st: RE: How to define shortest possible period with 95% ofobservations

 From "Nick Cox" To Subject RE: st: RE: How to define shortest possible period with 95% ofobservations Date Wed, 12 May 2010 20:58:37 +0100

```Well, graphics for one. If January is in the middle of the fire season,
you shouldn't want it to be the origin.

And summary statistics for another. If fire years really are more
natural than calendar years, you should prefer them as the time
framework for summaries.

Don't sniff at half a second of computer time. Some times it takes a
full day of revising a program to achieve such a saving.

Nick
n.j.cox@durham.ac.uk

Daniel Mueller

Well, we have thought about that. It is surely not a problem as the fire

season in our target area is relatively short (~3 month) and the 95% are

in a much shorter interval. Therefore, it does not matter where the max
is - we will capture the relevant period. We have other good reasons to
use the max for this definition.

But I bow in and use -egen- with the function
-dayofyear(daily_date_variable)- and map the day to the fire season
accordingly. I guess that was Nick's suggestion. It saves two lines of
code and maybe half a second of computing time. Are there other

Steve Samuels wrote on 5/13/2010 2:33 AM:
> What you are asking is not only contrary to Nick's recommendation
> (and, now, mine), it is a mistake. You are assuming that the peak day
> for fires occurs exactly in the middle of a "fire year".  Of course
> the  peak days will differ from year to year, and you will wind up
> with overlapping periods.

> On Wed, May 12, 2010 at 3:13 PM, Nick Cox<n.j.cox@durham.ac.uk>
wrote:
>> What you outline is not what I was recommending. It's an awkward
>> half-way house.
>>
>> But in terms of your new question, see
>>
>> SJ-6-4  dm0025  . . . . . . . . . .  Stata tip 36: Which
observations?
>> Erratum
>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
N.
>> J. Cox
>>         Q4/06   SJ 6(4):596                              (no
commands)
>>         correction of example code for Stata tip 36
>>
>> SJ-6-3  dm0025  . . . . . . . . . . . . . .  Stata tip 36: Which
>> observations?
>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
N.
>> J. Cox
>>         Q3/06   SJ 6(3):430--432                                 (no
>> commands)
>>         tip for identifying which observations satisfy some
>>         specified condition
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Daniel Mueller
>>
>> Thanks.
>>
>> In the code below I did my best to think in fire years by defining
the
>> peak fire day (these are always unique) as the middle of the year.
Then
>> I'd like to place the maximum into a local macro (where I fail
miserably
>>
>> in line 2 as Steve rightly pointed out):
>>
>> qui su no_fire_day
>> loc yearstart = Day[r(max)] - 183
>> loc yearend = `yearstart' + 365
>>
>> My simple question is how I can place the Day where no_fires_day =
>> r(max) into a local macro?
>>
>> (I ignore leap years for the sake of simplicity..)
>>
>>
>> Nick Cox wrote on 5/12/2010 10:44 PM:
>>> Without looking at this in detail, it seems to me that you might
>> benefit
>>> from thinking in terms of fire years, rather than calendar years,
>>> starting on some day other than January 1.  After all, all sorts of
>>> different sciences, not to mention religions, have years that don't
>>> coincide with conventional Western calendar years: fiscal years,
water
>>> years, academic years, etc., etc.
>>>
>>> Several pertinent -egen- functions are included in -egenmore- on
SSC.
>>>
>>> In other words, define the time scale in terms of those fire years;
>> then
>>> Robert's code will probably not need any complicated adjustments.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> Daniel Mueller
>>>
>>> Robert, this works like charm!!! Thanks a bunch for this neat code.
>> Also
>>>
>>> thanks to Nick for pointing me to -shorth- which I will certainly
>>> explore in more detail after having sipped through the extensive
>>> reference list.
>>>
>>> Using Roberts code I can seamlessly loop over the nine years of data
>> and
>>>
>>> generate the shortest fire season per year with 95% of obs. The
>> results
>>> suggested an additional complication.. For some subsets the shortest
>>> possible period likely starts a couple of days before Jan 1st, at
the
>>> end of the preceding year.
>>>
>>> I tweaked Roberts code a little to loop over years and defined the
>>> middle of a year as the peak fire day. The code runs through, yet
sets
>>> the start of the fire season for some subsets to Jan 1st, while my
>>> educated guess is that it should be somewhere around mid to end of
>>> December. Something went wrong, but I can't spot the glitch in the
>> code
>>>
>>> Thanks a lot in advance and best regards,
>>> Daniel
>>>
>>>
>>> *** start
>>> forv y = `yearfirst'/`yearlast' {
>>>
>>> * keep previous year
>>>     if `y' != `yearfirst' {
>>>      keep if Year == `y' | Year == (`y'-1)
>>>     }
>>>     bys Day: g no_fire_day = _N
>>>     qui su no_fire_day
>>>
>>> * define year to start 183 days before peak fire day
>>>     loc yearstart = Day[r(max)] - 183
>>>     loc yearend = `yearstart' + 365
>>>     keep if Day>    `yearstart'&    Day<    `yearend' // or with
>> egen->rotate?
>>>     bys Day: keep if _n == _N
>>>     g nobs = _n
>>>
>>> * the target is a continuous run that includes 95% of all fires
>>>     sum no_fire_day, meanonly
>>>     scalar target = .95 * r(sum)
>>>
>>>     scalar shortlen = .
>>>     gen arun = .
>>>     gen bestrun = .
>>>
>>>     * at each pass, create a run that starts at nobs == `i'
>>>     * and identify the nobs where the number of fires>= 95%
>>>     local more 1
>>>     local i = 0
>>>     while `more' {
>>>      local i = `i' + 1
>>>      qui replace arun = sum(no_fire_day * (nobs>= `i'))
>>>      sum nobs if arun>= target, meanonly
>>>      if r(N) == 0 local more 0
>>>      else if (Day[r(min)] - Day[`i'])<    shortlen {
>>>       scalar shortlen = Day[r(min)] - Day[`i']
>>>       qui replace bestrun = arun
>>>       qui replace bestrun = . if nobs>    r(min) | nobs<    `i'
>>>      }
>>>     }
>>>     qui drop if bestrun == .
>>>     drop bestrun arun
>>>     save fires_`y', replace
>>> }
>>> *** end
>>>
>>>
>>>
>>>
>>>
>>> Robert Picard wrote on 5/11/2010 3:28 AM:
>>>> Here is how I would approach this problem. I would do each year
>>>> separately; it could be done all at once but it would complicate
the
>>>> code unnecessarily. If the fire data is one observation per fire, I
>>>> would -collapse- it to one observation per day. Each observation
>> would
>>>> contain the number of fires that day. The following code will
>> identify
>>>> the first instance of the shortest run of days that includes 95% of
>>>> fires for the year.
>>>>
>>>> Note that the following code will work, even if there are days
>> without
>>>> fires (and thus no observation for that day).
>>>>
>>>> *--------------------------- begin example -----------------------
>>>> version 11
>>>>
>>>> * daily fire counts; with some days without fires
>>>> clear all
>>>> set seed 123
>>>> set obs 365
>>>> gen day = _n
>>>> drop if uniform()<     .1
>>>> gen nobs = _n
>>>> gen nfires = round(uniform() * 10)
>>>>
>>>> * the target is a continuous run that includes 95% of all fires
>>>> sum nfires, meanonly
>>>> scalar target = .95 * r(sum)
>>>> dis target
>>>>
>>>> scalar shortlen = .
>>>> gen arun = .
>>>> gen bestrun = .
>>>>
>>>> * at each pass, create a run that starts at nobs == `i'
>>>> * and identify the nobs where the number of fires>= 95%
>>>> local more 1
>>>> local i 0
>>>> while `more' {
>>>>       local i = `i' + 1
>>>>       qui replace arun = sum(nfires * (nobs>=`i'))
>>>>       sum nobs if arun>= target, meanonly
>>>>       if r(N) == 0 local more 0
>>>>       else if (day[r(min)] - day[`i'])<     shortlen {
>>>>               scalar shortlen = day[r(min)] - day[`i']
>>>>               qui replace bestrun = arun
>>>>               qui replace bestrun = . if nobs>     r(min) | nobs<
`i'
>>>>       }
>>>> }
>>>>
>>>> *--------------------- end example --------------------------
>>>>
>>>>
>>>> Hope this help,
>>>>
>>>> Robert
>>>>
>>>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<n.j.cox@durham.ac.uk>
>>> wrote:
>>>>> I don't think any trick is possible unless you know in advance the
>>>>> precise distribution, e.g. that it is Gaussian, or exponential, or
>>>>> whatever, which here is not the case.
>>>>>
>>>>> So, you need to look at all the possibilities from the interval
>>> starting
>>>>> at the minimum to the interval starting at the 5% point of the
fire
>>>>> number distribution in each year.
>>>>>
>>>>> However, this may all be achievable using -shorth- (SSC). Look at
>> the
>>>>> -proportion()- option, but you would need to -expand- first to get
a
>>>>> separate observation for each fire. If that's not practicable,
look
>>>>> inside the code of -shorth- to get ideas on how to proceed. Note
>> that
>>> no
>>>>> looping is necessary: the whole problem will reduce to use of
-by:-
>>> and
>>>>> subscripts.
>>>>>
>>>>> Nick
>>>>> n.j.cox@durham.ac.uk
>>>>>
>>>>> Daniel Mueller
>>>>>
>>>>> I have a strongly unbalanced panel with 100,000 observations
(=fire
>>>>> occurrences per day) that contain between none (no fire) and 3,000
>>> fires
>>>>>
>>>>> per day for 8 years. The fire events peak in March and April with
>>>>> 85-90% of the yearly total.
>>>>>
>>>>> My question is how I can define the shortest possible continuous
>>> period
>>>>> of days for each year that contains 95% of all yearly fires. The
>>> length
>>>>> and width of the periods may slightly differ across the years due
to
>>>>> climate and other parameters.
>>>>>
>>>>> I am sure there is a neat trick in Stata for this, yet I have not
>>>>> spotted it. Any suggestions would be appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```