Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: How to define shortest possible period with 95% of observations


From   Daniel Mueller <[email protected]>
To   [email protected]
Subject   Re: st: RE: How to define shortest possible period with 95% of observations
Date   Wed, 12 May 2010 21:59:25 +0700

Robert, this works like charm!!! Thanks a bunch for this neat code. Also thanks to Nick for pointing me to -shorth- which I will certainly explore in more detail after having sipped through the extensive reference list.

Using Roberts code I can seamlessly loop over the nine years of data and generate the shortest fire season per year with 95% of obs. The results suggested an additional complication.. For some subsets the shortest possible period likely starts a couple of days before Jan 1st, at the end of the preceding year.

I tweaked Roberts code a little to loop over years and defined the middle of a year as the peak fire day. The code runs through, yet sets the start of the fire season for some subsets to Jan 1st, while my educated guess is that it should be somewhere around mid to end of December. Something went wrong, but I can't spot the glitch in the code below. Can someone please help?

Thanks a lot in advance and best regards,
Daniel


*** start
forv y = `yearfirst'/`yearlast' {

* keep previous year
 if `y' != `yearfirst' {
  keep if Year == `y' | Year == (`y'-1)
 }
 bys Day: g no_fire_day = _N
 qui su no_fire_day

* define year to start 183 days before peak fire day
 loc yearstart = Day[r(max)] - 183
 loc yearend = `yearstart' + 365
 keep if Day > `yearstart' & Day < `yearend' // or with egen->rotate?
 bys Day: keep if _n == _N
 g nobs = _n

* the target is a continuous run that includes 95% of all fires
 sum no_fire_day, meanonly
 scalar target = .95 * r(sum)

 scalar shortlen = .
 gen arun = .
 gen bestrun = .

 * at each pass, create a run that starts at nobs == `i'
 * and identify the nobs where the number of fires >= 95%
 local more 1
 local i = 0
 while `more' {
  local i = `i' + 1
  qui replace arun = sum(no_fire_day * (nobs >= `i'))
  sum nobs if arun >= target, meanonly
  if r(N) == 0 local more 0
  else if (Day[r(min)] - Day[`i']) < shortlen {
   scalar shortlen = Day[r(min)] - Day[`i']
   qui replace bestrun = arun
   qui replace bestrun = . if nobs > r(min) | nobs < `i'
  }
 }
 qui drop if bestrun == .
 drop bestrun arun
 save fires_`y', replace
}
*** end





Robert Picard wrote on 5/11/2010 3:28 AM:
Here is how I would approach this problem. I would do each year
separately; it could be done all at once but it would complicate the
code unnecessarily. If the fire data is one observation per fire, I
would -collapse- it to one observation per day. Each observation would
contain the number of fires that day. The following code will identify
the first instance of the shortest run of days that includes 95% of
fires for the year.

Note that the following code will work, even if there are days without
fires (and thus no observation for that day).

*--------------------------- begin example -----------------------
version 11

* daily fire counts; with some days without fires
clear all
set seed 123
set obs 365
gen day = _n
drop if uniform()<  .1
gen nobs = _n
gen nfires = round(uniform() * 10)

* the target is a continuous run that includes 95% of all fires
sum nfires, meanonly
scalar target = .95 * r(sum)
dis target

scalar shortlen = .
gen arun = .
gen bestrun = .

* at each pass, create a run that starts at nobs == `i'
* and identify the nobs where the number of fires>= 95%
local more 1
local i 0
while `more' {
	local i = `i' + 1
	qui replace arun = sum(nfires * (nobs>=`i'))
	sum nobs if arun>= target, meanonly
	if r(N) == 0 local more 0
	else if (day[r(min)] - day[`i'])<  shortlen {
		scalar shortlen = day[r(min)] - day[`i']
		qui replace bestrun = arun
		qui replace bestrun = . if nobs>  r(min) | nobs<  `i'
	}
}

*--------------------- end example --------------------------


Hope this help,

Robert

On Mon, May 10, 2010 at 6:19 AM, Nick Cox<[email protected]>  wrote:
I don't think any trick is possible unless you know in advance the
precise distribution, e.g. that it is Gaussian, or exponential, or
whatever, which here is not the case.

So, you need to look at all the possibilities from the interval starting
at the minimum to the interval starting at the 5% point of the fire
number distribution in each year.

However, this may all be achievable using -shorth- (SSC). Look at the
-proportion()- option, but you would need to -expand- first to get a
separate observation for each fire. If that's not practicable, look
inside the code of -shorth- to get ideas on how to proceed. Note that no
looping is necessary: the whole problem will reduce to use of -by:- and
subscripts.

Nick
[email protected]

Daniel Mueller

I have a strongly unbalanced panel with 100,000 observations (=fire
occurrences per day) that contain between none (no fire) and 3,000 fires

per day for 8 years. The fire events peak in March and April with about
85-90% of the yearly total.

My question is how I can define the shortest possible continuous period
of days for each year that contains 95% of all yearly fires. The length
and width of the periods may slightly differ across the years due to
climate and other parameters.

I am sure there is a neat trick in Stata for this, yet I have not
spotted it. Any suggestions would be appreciated.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index