Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: How to define shortest possible period with 95% of observations


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: How to define shortest possible period with 95% of observations
Date   Wed, 12 May 2010 15:41:38 -0400

Daniel et al.--
I think Robert's code also assumes that only observations for one year
are in memory; it is not -by-able.  So you might be incorrectly
calculating within your loop across years--you may want to create some
fake data as Robert did and show us what you are doing.  A
disadvantage of moving to a non-calendar year is that it is harder to
explain, but Oct 1 to Sep 30 might work for your purposes (if peak
times are near Apr 1), and corresponds to a common fiscal year, fwiw.

On Wed, May 12, 2010 at 3:33 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
> What you are asking is not only contrary to Nick's recommendation
> (and, now, mine), it is a mistake. You are assuming that the peak day
> for fires occurs exactly in the middle of a "fire year".  Of course
> the  peak days will differ from year to year, and you will wind up
> with overlapping periods.
>
> Steve
>
>
>
> On Wed, May 12, 2010 at 3:13 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>> What you outline is not what I was recommending. It's an awkward
>> half-way house.
>>
>> But in terms of your new question, see
>>
>> SJ-6-4  dm0025  . . . . . . . . . .  Stata tip 36: Which observations?
>> Erratum
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
>> J. Cox
>>        Q4/06   SJ 6(4):596                              (no commands)
>>        correction of example code for Stata tip 36
>>
>> SJ-6-3  dm0025  . . . . . . . . . . . . . .  Stata tip 36: Which
>> observations?
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
>> J. Cox
>>        Q3/06   SJ 6(3):430--432                                 (no
>> commands)
>>        tip for identifying which observations satisfy some
>>        specified condition
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Daniel Mueller
>>
>> Thanks.
>>
>> In the code below I did my best to think in fire years by defining the
>> peak fire day (these are always unique) as the middle of the year. Then
>> I'd like to place the maximum into a local macro (where I fail miserably
>>
>> in line 2 as Steve rightly pointed out):
>>
>> qui su no_fire_day
>> loc yearstart = Day[r(max)] - 183
>> loc yearend = `yearstart' + 365
>>
>> My simple question is how I can place the Day where no_fires_day =
>> r(max) into a local macro?
>>
>> (I ignore leap years for the sake of simplicity..)
>>
>>
>> Nick Cox wrote on 5/12/2010 10:44 PM:
>>> Without looking at this in detail, it seems to me that you might
>> benefit
>>> from thinking in terms of fire years, rather than calendar years,
>>> starting on some day other than January 1.  After all, all sorts of
>>> different sciences, not to mention religions, have years that don't
>>> coincide with conventional Western calendar years: fiscal years, water
>>> years, academic years, etc., etc.
>>>
>>> Several pertinent -egen- functions are included in -egenmore- on SSC.
>>>
>>> In other words, define the time scale in terms of those fire years;
>> then
>>> Robert's code will probably not need any complicated adjustments.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> Daniel Mueller
>>>
>>> Robert, this works like charm!!! Thanks a bunch for this neat code.
>> Also
>>>
>>> thanks to Nick for pointing me to -shorth- which I will certainly
>>> explore in more detail after having sipped through the extensive
>>> reference list.
>>>
>>> Using Roberts code I can seamlessly loop over the nine years of data
>> and
>>>
>>> generate the shortest fire season per year with 95% of obs. The
>> results
>>> suggested an additional complication.. For some subsets the shortest
>>> possible period likely starts a couple of days before Jan 1st, at the
>>> end of the preceding year.
>>>
>>> I tweaked Roberts code a little to loop over years and defined the
>>> middle of a year as the peak fire day. The code runs through, yet sets
>>> the start of the fire season for some subsets to Jan 1st, while my
>>> educated guess is that it should be somewhere around mid to end of
>>> December. Something went wrong, but I can't spot the glitch in the
>> code
>>> below. Can someone please help?
>>>
>>> Thanks a lot in advance and best regards,
>>> Daniel
>>>
>>>
>>> *** start
>>> forv y = `yearfirst'/`yearlast' {
>>>
>>> * keep previous year
>>>    if `y' != `yearfirst' {
>>>     keep if Year == `y' | Year == (`y'-1)
>>>    }
>>>    bys Day: g no_fire_day = _N
>>>    qui su no_fire_day
>>>
>>> * define year to start 183 days before peak fire day
>>>    loc yearstart = Day[r(max)] - 183
>>>    loc yearend = `yearstart' + 365
>>>    keep if Day>  `yearstart'&  Day<  `yearend' // or with
>> egen->rotate?
>>>    bys Day: keep if _n == _N
>>>    g nobs = _n
>>>
>>> * the target is a continuous run that includes 95% of all fires
>>>    sum no_fire_day, meanonly
>>>    scalar target = .95 * r(sum)
>>>
>>>    scalar shortlen = .
>>>    gen arun = .
>>>    gen bestrun = .
>>>
>>>    * at each pass, create a run that starts at nobs == `i'
>>>    * and identify the nobs where the number of fires>= 95%
>>>    local more 1
>>>    local i = 0
>>>    while `more' {
>>>     local i = `i' + 1
>>>     qui replace arun = sum(no_fire_day * (nobs>= `i'))
>>>     sum nobs if arun>= target, meanonly
>>>     if r(N) == 0 local more 0
>>>     else if (Day[r(min)] - Day[`i'])<  shortlen {
>>>      scalar shortlen = Day[r(min)] - Day[`i']
>>>      qui replace bestrun = arun
>>>      qui replace bestrun = . if nobs>  r(min) | nobs<  `i'
>>>     }
>>>    }
>>>    qui drop if bestrun == .
>>>    drop bestrun arun
>>>    save fires_`y', replace
>>> }
>>> *** end
>>>
>>>
>>>
>>>
>>>
>>> Robert Picard wrote on 5/11/2010 3:28 AM:
>>>> Here is how I would approach this problem. I would do each year
>>>> separately; it could be done all at once but it would complicate the
>>>> code unnecessarily. If the fire data is one observation per fire, I
>>>> would -collapse- it to one observation per day. Each observation
>> would
>>>> contain the number of fires that day. The following code will
>> identify
>>>> the first instance of the shortest run of days that includes 95% of
>>>> fires for the year.
>>>>
>>>> Note that the following code will work, even if there are days
>> without
>>>> fires (and thus no observation for that day).
>>>>
>>>> *--------------------------- begin example -----------------------
>>>> version 11
>>>>
>>>> * daily fire counts; with some days without fires
>>>> clear all
>>>> set seed 123
>>>> set obs 365
>>>> gen day = _n
>>>> drop if uniform()<   .1
>>>> gen nobs = _n
>>>> gen nfires = round(uniform() * 10)
>>>>
>>>> * the target is a continuous run that includes 95% of all fires
>>>> sum nfires, meanonly
>>>> scalar target = .95 * r(sum)
>>>> dis target
>>>>
>>>> scalar shortlen = .
>>>> gen arun = .
>>>> gen bestrun = .
>>>>
>>>> * at each pass, create a run that starts at nobs == `i'
>>>> * and identify the nobs where the number of fires>= 95%
>>>> local more 1
>>>> local i 0
>>>> while `more' {
>>>>      local i = `i' + 1
>>>>      qui replace arun = sum(nfires * (nobs>=`i'))
>>>>      sum nobs if arun>= target, meanonly
>>>>      if r(N) == 0 local more 0
>>>>      else if (day[r(min)] - day[`i'])<   shortlen {
>>>>              scalar shortlen = day[r(min)] - day[`i']
>>>>              qui replace bestrun = arun
>>>>              qui replace bestrun = . if nobs>   r(min) | nobs<   `i'
>>>>      }
>>>> }
>>>>
>>>> *--------------------- end example --------------------------
>>>>
>>>>
>>>> Hope this help,
>>>>
>>>> Robert
>>>>
>>>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<n.j.cox@durham.ac.uk>
>>> wrote:
>>>>> I don't think any trick is possible unless you know in advance the
>>>>> precise distribution, e.g. that it is Gaussian, or exponential, or
>>>>> whatever, which here is not the case.
>>>>>
>>>>> So, you need to look at all the possibilities from the interval
>>> starting
>>>>> at the minimum to the interval starting at the 5% point of the fire
>>>>> number distribution in each year.
>>>>>
>>>>> However, this may all be achievable using -shorth- (SSC). Look at
>> the
>>>>> -proportion()- option, but you would need to -expand- first to get a
>>>>> separate observation for each fire. If that's not practicable, look
>>>>> inside the code of -shorth- to get ideas on how to proceed. Note
>> that
>>> no
>>>>> looping is necessary: the whole problem will reduce to use of -by:-
>>> and
>>>>> subscripts.
>>>>>
>>>>> Nick
>>>>> n.j.cox@durham.ac.uk
>>>>>
>>>>> Daniel Mueller
>>>>>
>>>>> I have a strongly unbalanced panel with 100,000 observations (=fire
>>>>> occurrences per day) that contain between none (no fire) and 3,000
>>> fires
>>>>>
>>>>> per day for 8 years. The fire events peak in March and April with
>>> about
>>>>> 85-90% of the yearly total.
>>>>>
>>>>> My question is how I can define the shortest possible continuous
>>> period
>>>>> of days for each year that contains 95% of all yearly fires. The
>>> length
>>>>> and width of the periods may slightly differ across the years due to
>>>>> climate and other parameters.
>>>>>
>>>>> I am sure there is a neat trick in Stata for this, yet I have not
>>>>> spotted it. Any suggestions would be appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index