Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Daniel Mueller <mueller@iamo.de> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: How to define shortest possible period with 95% of observations |

Date |
Fri, 14 May 2010 02:21:13 +0700 |

Daniel Robert Picard wrote on 5/13/2010 3:03 AM:

Apologies for a poor choice of variable name. Instead of day, I should have called it firedate. Since Stata dates are actually days since 01jan1960, you can do date arithmetics with them. My code should work for any time period in memory, as long as the firedate variable contains a Stata date and the records are sorted by date. You can target 95% of a 6 month span or 3 years, it does not matter. As to choosing a year of observation centered around the day with the most fire, I think you were on the right tract Daniel, and it gives an opportunity to show again how to identify an observation that matches a certain condition. All you need to do is to have a variable set to the number of observation (_n). In the updated example below, you do a summarize to identify the maximum number of fires and then you do another summarize on n if nfires == r(max). The r(min) returned from the second sum indicates the number of the first observation that matches the max number of fires. *--------------------------- begin example ----------------------- version 11 * generate two years of data clear all set seed 12345 set obs 730 gen firedate = mdy(1,1,2001) + _n - 1 format %d firedate sum, format * generate more fires arount May; some days without fires gen fireyear = year(firedate) sort fireyear firedate by fireyear: gen nfires = round(uniform() * 1000 / abs(_n-140.1)^.5) drop if uniform()< .1 line nfires firedate * Identify peak fire date; choose first if more than one peak date gen n = _n sum nfires if fireyear == 2002 sum n if nfires == r(max) local peakdate = firedate[r(min)] list if firedate == `peakdate' * select a year around peak fire date drop if firedate< `peakdate' - 183 drop if firedate> firedate[1] + 364 sum, format * the target is a continuous run that includes 95% of all fires sort firedate gen nobs = _n sum nfires, meanonly scalar target = .95 * r(sum) dis target scalar shortlen = . gen arun = . gen bestrun = . * at each pass, create a run that starts at nobs == `i' * and identify the nobs where the number of fires>= 95% local more 1 local i 0 while `more' { local i = `i' + 1 qui replace arun = sum(nfires * (nobs>=`i')) sum nobs if arun>= target, meanonly if r(N) == 0 local more 0 else if (firedate[r(min)] - firedate[`i'])< shortlen { scalar shortlen = firedate[r(min)] - firedate[`i'] qui replace bestrun = arun qui replace bestrun = . if nobs> r(min) | nobs< `i' } } *--------------------- end example -------------------------- Hope this helps, Robert I On Wed, May 12, 2010 at 11:44 AM, Nick Cox<n.j.cox@durham.ac.uk> wrote:Without looking at this in detail, it seems to me that you might benefit from thinking in terms of fire years, rather than calendar years, starting on some day other than January 1. After all, all sorts of different sciences, not to mention religions, have years that don't coincide with conventional Western calendar years: fiscal years, water years, academic years, etc., etc. Several pertinent -egen- functions are included in -egenmore- on SSC. In other words, define the time scale in terms of those fire years; then Robert's code will probably not need any complicated adjustments. Nick n.j.cox@durham.ac.uk Daniel Mueller Robert, this works like charm!!! Thanks a bunch for this neat code. Also thanks to Nick for pointing me to -shorth- which I will certainly explore in more detail after having sipped through the extensive reference list. Using Roberts code I can seamlessly loop over the nine years of data and generate the shortest fire season per year with 95% of obs. The results suggested an additional complication.. For some subsets the shortest possible period likely starts a couple of days before Jan 1st, at the end of the preceding year. I tweaked Roberts code a little to loop over years and defined the middle of a year as the peak fire day. The code runs through, yet sets the start of the fire season for some subsets to Jan 1st, while my educated guess is that it should be somewhere around mid to end of December. Something went wrong, but I can't spot the glitch in the code below. Can someone please help? Thanks a lot in advance and best regards, Daniel *** start forv y = `yearfirst'/`yearlast' { * keep previous year if `y' != `yearfirst' { keep if Year == `y' | Year == (`y'-1) } bys Day: g no_fire_day = _N qui su no_fire_day * define year to start 183 days before peak fire day loc yearstart = Day[r(max)] - 183 loc yearend = `yearstart' + 365 keep if Day> `yearstart'& Day< `yearend' // or with egen->rotate? bys Day: keep if _n == _N g nobs = _n * the target is a continuous run that includes 95% of all fires sum no_fire_day, meanonly scalar target = .95 * r(sum) scalar shortlen = . gen arun = . gen bestrun = . * at each pass, create a run that starts at nobs == `i' * and identify the nobs where the number of fires>= 95% local more 1 local i = 0 while `more' { local i = `i' + 1 qui replace arun = sum(no_fire_day * (nobs>= `i')) sum nobs if arun>= target, meanonly if r(N) == 0 local more 0 else if (Day[r(min)] - Day[`i'])< shortlen { scalar shortlen = Day[r(min)] - Day[`i'] qui replace bestrun = arun qui replace bestrun = . if nobs> r(min) | nobs< `i' } } qui drop if bestrun == . drop bestrun arun save fires_`y', replace } *** end Robert Picard wrote on 5/11/2010 3:28 AM:Here is how I would approach this problem. I would do each year separately; it could be done all at once but it would complicate the code unnecessarily. If the fire data is one observation per fire, I would -collapse- it to one observation per day. Each observation would contain the number of fires that day. The following code will identify the first instance of the shortest run of days that includes 95% of fires for the year. Note that the following code will work, even if there are days without fires (and thus no observation for that day). *--------------------------- begin example ----------------------- version 11 * daily fire counts; with some days without fires clear all set seed 123 set obs 365 gen day = _n drop if uniform()< .1 gen nobs = _n gen nfires = round(uniform() * 10) * the target is a continuous run that includes 95% of all fires sum nfires, meanonly scalar target = .95 * r(sum) dis target scalar shortlen = . gen arun = . gen bestrun = . * at each pass, create a run that starts at nobs == `i' * and identify the nobs where the number of fires>= 95% local more 1 local i 0 while `more' { local i = `i' + 1 qui replace arun = sum(nfires * (nobs>=`i')) sum nobs if arun>= target, meanonly if r(N) == 0 local more 0 else if (day[r(min)] - day[`i'])< shortlen { scalar shortlen = day[r(min)] - day[`i'] qui replace bestrun = arun qui replace bestrun = . if nobs> r(min) | nobs< `i' } } *--------------------- end example -------------------------- Hope this help, Robert On Mon, May 10, 2010 at 6:19 AM, Nick Cox<n.j.cox@durham.ac.uk>wrote:I don't think any trick is possible unless you know in advance the precise distribution, e.g. that it is Gaussian, or exponential, or whatever, which here is not the case. So, you need to look at all the possibilities from the intervalstartingat the minimum to the interval starting at the 5% point of the fire number distribution in each year. However, this may all be achievable using -shorth- (SSC). Look at the -proportion()- option, but you would need to -expand- first to get a separate observation for each fire. If that's not practicable, look inside the code of -shorth- to get ideas on how to proceed. Note thatnolooping is necessary: the whole problem will reduce to use of -by:-andsubscripts. Nick n.j.cox@durham.ac.uk Daniel Mueller I have a strongly unbalanced panel with 100,000 observations (=fire occurrences per day) that contain between none (no fire) and 3,000firesper day for 8 years. The fire events peak in March and April withabout85-90% of the yearly total. My question is how I can define the shortest possible continuousperiodof days for each year that contains 95% of all yearly fires. Thelengthand width of the periods may slightly differ across the years due to climate and other parameters. I am sure there is a neat trick in Stata for this, yet I have not spotted it. Any suggestions would be appreciated.* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**st: RE: How to define shortest possible period with 95% of observations***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Robert Picard <picard@netbox.com>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**RE: st: RE: How to define shortest possible period with 95% of observations***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Robert Picard <picard@netbox.com>

- Prev by Date:
**st: interpreting multivariable fractional polynomials** - Next by Date:
**st:reshape command ---listing all the variables changing over time?** - Previous by thread:
**Re: st: RE: How to define shortest possible period with 95% of observations** - Next by thread:
**st: AW: RE: AW: AW: RE: -ciplot- -eclplot- combine with box-options?** - Index(es):