Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: How to define shortest possible period with 95% of observations

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: RE: How to define shortest possible period with 95% of observations
Date	Wed, 12 May 2010 11:28:13 -0400

I can't be sure of the meaning of your variables, but I see two
potential problems.

1. There might be multiple days with  themaximum.
2. Day[`rmax'] does not identify the day with the maximum.  Consider
the following data:

Day  no_fires_day

1   1  1
2   1  1
3   1  2
3   1  2
4   1  1
The max is 2 and day[2] identifies the second observation, not the third.

Steve



On Wed, May 12, 2010 at 10:59 AM, Daniel Mueller <[email protected]> wrote:
> Robert, this works like charm!!! Thanks a bunch for this neat code. Also
> thanks to Nick for pointing me to -shorth- which I will certainly explore in
> more detail after having sipped through the extensive reference list.
>
> Using Roberts code I can seamlessly loop over the nine years of data and
> generate the shortest fire season per year with 95% of obs. The results
> suggested an additional complication.. For some subsets the shortest
> possible period likely starts a couple of days before Jan 1st, at the end of
> the preceding year.
>
> I tweaked Roberts code a little to loop over years and defined the middle of
> a year as the peak fire day. The code runs through, yet sets the start of
> the fire season for some subsets to Jan 1st, while my educated guess is that
> it should be somewhere around mid to end of December. Something went wrong,
> but I can't spot the glitch in the code below. Can someone please help?
>
> Thanks a lot in advance and best regards,
> Daniel
>
>
> *** start
> forv y = `yearfirst'/`yearlast' {
>
> * keep previous year
>  if `y' != `yearfirst' {
>  keep if Year == `y' | Year == (`y'-1)
>  }
>  bys Day: g no_fire_day = _N
>  qui su no_fire_day
>
> * define year to start 183 days before peak fire day
>  loc yearstart = Day[r(max)] - 183
>  loc yearend = `yearstart' + 365
>  keep if Day > `yearstart' & Day < `yearend' // or with egen->rotate?
>  bys Day: keep if _n == _N
>  g nobs = _n
>
> * the target is a continuous run that includes 95% of all fires
>  sum no_fire_day, meanonly
>  scalar target = .95 * r(sum)
>
>  scalar shortlen = .
>  gen arun = .
>  gen bestrun = .
>
>  * at each pass, create a run that starts at nobs == `i'
>  * and identify the nobs where the number of fires >= 95%
>  local more 1
>  local i = 0
>  while `more' {
>  local i = `i' + 1
>  qui replace arun = sum(no_fire_day * (nobs >= `i'))
>  sum nobs if arun >= target, meanonly
>  if r(N) == 0 local more 0
>  else if (Day[r(min)] - Day[`i']) < shortlen {
>   scalar shortlen = Day[r(min)] - Day[`i']
>   qui replace bestrun = arun
>   qui replace bestrun = . if nobs > r(min) | nobs < `i'
>  }
>  }
>  qui drop if bestrun == .
>  drop bestrun arun
>  save fires_`y', replace
> }
> *** end
>
>
>
>
>
> Robert Picard wrote on 5/11/2010 3:28 AM:
>>
>> Here is how I would approach this problem. I would do each year
>> separately; it could be done all at once but it would complicate the
>> code unnecessarily. If the fire data is one observation per fire, I
>> would -collapse- it to one observation per day. Each observation would
>> contain the number of fires that day. The following code will identify
>> the first instance of the shortest run of days that includes 95% of
>> fires for the year.
>>
>> Note that the following code will work, even if there are days without
>> fires (and thus no observation for that day).
>>
>> *--------------------------- begin example -----------------------
>> version 11
>>
>> * daily fire counts; with some days without fires
>> clear all
>> set seed 123
>> set obs 365
>> gen day = _n
>> drop if uniform()<  .1
>> gen nobs = _n
>> gen nfires = round(uniform() * 10)
>>
>> * the target is a continuous run that includes 95% of all fires
>> sum nfires, meanonly
>> scalar target = .95 * r(sum)
>> dis target
>>
>> scalar shortlen = .
>> gen arun = .
>> gen bestrun = .
>>
>> * at each pass, create a run that starts at nobs == `i'
>> * and identify the nobs where the number of fires>= 95%
>> local more 1
>> local i 0
>> while `more' {
>>        local i = `i' + 1
>>        qui replace arun = sum(nfires * (nobs>=`i'))
>>        sum nobs if arun>= target, meanonly
>>        if r(N) == 0 local more 0
>>        else if (day[r(min)] - day[`i'])<  shortlen {
>>                scalar shortlen = day[r(min)] - day[`i']
>>                qui replace bestrun = arun
>>                qui replace bestrun = . if nobs>  r(min) | nobs<  `i'
>>        }
>> }
>>
>> *--------------------- end example --------------------------
>>
>>
>> Hope this help,
>>
>> Robert
>>
>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<[email protected]>  wrote:
>>>
>>> I don't think any trick is possible unless you know in advance the
>>> precise distribution, e.g. that it is Gaussian, or exponential, or
>>> whatever, which here is not the case.
>>>
>>> So, you need to look at all the possibilities from the interval starting
>>> at the minimum to the interval starting at the 5% point of the fire
>>> number distribution in each year.
>>>
>>> However, this may all be achievable using -shorth- (SSC). Look at the
>>> -proportion()- option, but you would need to -expand- first to get a
>>> separate observation for each fire. If that's not practicable, look
>>> inside the code of -shorth- to get ideas on how to proceed. Note that no
>>> looping is necessary: the whole problem will reduce to use of -by:- and
>>> subscripts.
>>>
>>> Nick
>>> [email protected]
>>>
>>> Daniel Mueller
>>>
>>> I have a strongly unbalanced panel with 100,000 observations (=fire
>>> occurrences per day) that contain between none (no fire) and 3,000 fires
>>>
>>> per day for 8 years. The fire events peak in March and April with about
>>> 85-90% of the yearly total.
>>>
>>> My question is how I can define the shortest possible continuous period
>>> of days for each year that contains 95% of all yearly fires. The length
>>> and width of the periods may slightly differ across the years due to
>>> climate and other parameters.
>>>
>>> I am sure there is a neat trick in Stata for this, yet I have not
>>> spotted it. Any suggestions would be appreciated.
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: How to define shortest possible period with 95% of observations
  - From: Daniel Mueller <[email protected]>
- st: RE: How to define shortest possible period with 95% of observations
  - From: "Nick Cox" <[email protected]>
- Re: st: RE: How to define shortest possible period with 95% of observations
  - From: Robert Picard <[email protected]>
- Re: st: RE: How to define shortest possible period with 95% of observations
  - From: Daniel Mueller <[email protected]>

Prev by Date: st: SV: RE: Splitting a textvariable
Next by Date: st: tsset multiple imputation panel data
Previous by thread: Re: st: RE: How to define shortest possible period with 95% of observations
Next by thread: RE: st: RE: How to define shortest possible period with 95% of observations
Index(es):
- Date
- Thread