Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: How to define shortest possible period with 95% of observations |

Date |
Wed, 12 May 2010 11:28:13 -0400 |

I can't be sure of the meaning of your variables, but I see two potential problems. 1. There might be multiple days with themaximum. 2. Day[`rmax'] does not identify the day with the maximum. Consider the following data: Day no_fires_day 1 1 1 2 1 1 3 1 2 3 1 2 4 1 1 The max is 2 and day[2] identifies the second observation, not the third. Steve On Wed, May 12, 2010 at 10:59 AM, Daniel Mueller <mueller@iamo.de> wrote: > Robert, this works like charm!!! Thanks a bunch for this neat code. Also > thanks to Nick for pointing me to -shorth- which I will certainly explore in > more detail after having sipped through the extensive reference list. > > Using Roberts code I can seamlessly loop over the nine years of data and > generate the shortest fire season per year with 95% of obs. The results > suggested an additional complication.. For some subsets the shortest > possible period likely starts a couple of days before Jan 1st, at the end of > the preceding year. > > I tweaked Roberts code a little to loop over years and defined the middle of > a year as the peak fire day. The code runs through, yet sets the start of > the fire season for some subsets to Jan 1st, while my educated guess is that > it should be somewhere around mid to end of December. Something went wrong, > but I can't spot the glitch in the code below. Can someone please help? > > Thanks a lot in advance and best regards, > Daniel > > > *** start > forv y = `yearfirst'/`yearlast' { > > * keep previous year > if `y' != `yearfirst' { > keep if Year == `y' | Year == (`y'-1) > } > bys Day: g no_fire_day = _N > qui su no_fire_day > > * define year to start 183 days before peak fire day > loc yearstart = Day[r(max)] - 183 > loc yearend = `yearstart' + 365 > keep if Day > `yearstart' & Day < `yearend' // or with egen->rotate? > bys Day: keep if _n == _N > g nobs = _n > > * the target is a continuous run that includes 95% of all fires > sum no_fire_day, meanonly > scalar target = .95 * r(sum) > > scalar shortlen = . > gen arun = . > gen bestrun = . > > * at each pass, create a run that starts at nobs == `i' > * and identify the nobs where the number of fires >= 95% > local more 1 > local i = 0 > while `more' { > local i = `i' + 1 > qui replace arun = sum(no_fire_day * (nobs >= `i')) > sum nobs if arun >= target, meanonly > if r(N) == 0 local more 0 > else if (Day[r(min)] - Day[`i']) < shortlen { > scalar shortlen = Day[r(min)] - Day[`i'] > qui replace bestrun = arun > qui replace bestrun = . if nobs > r(min) | nobs < `i' > } > } > qui drop if bestrun == . > drop bestrun arun > save fires_`y', replace > } > *** end > > > > > > Robert Picard wrote on 5/11/2010 3:28 AM: >> >> Here is how I would approach this problem. I would do each year >> separately; it could be done all at once but it would complicate the >> code unnecessarily. If the fire data is one observation per fire, I >> would -collapse- it to one observation per day. Each observation would >> contain the number of fires that day. The following code will identify >> the first instance of the shortest run of days that includes 95% of >> fires for the year. >> >> Note that the following code will work, even if there are days without >> fires (and thus no observation for that day). >> >> *--------------------------- begin example ----------------------- >> version 11 >> >> * daily fire counts; with some days without fires >> clear all >> set seed 123 >> set obs 365 >> gen day = _n >> drop if uniform()< .1 >> gen nobs = _n >> gen nfires = round(uniform() * 10) >> >> * the target is a continuous run that includes 95% of all fires >> sum nfires, meanonly >> scalar target = .95 * r(sum) >> dis target >> >> scalar shortlen = . >> gen arun = . >> gen bestrun = . >> >> * at each pass, create a run that starts at nobs == `i' >> * and identify the nobs where the number of fires>= 95% >> local more 1 >> local i 0 >> while `more' { >> local i = `i' + 1 >> qui replace arun = sum(nfires * (nobs>=`i')) >> sum nobs if arun>= target, meanonly >> if r(N) == 0 local more 0 >> else if (day[r(min)] - day[`i'])< shortlen { >> scalar shortlen = day[r(min)] - day[`i'] >> qui replace bestrun = arun >> qui replace bestrun = . if nobs> r(min) | nobs< `i' >> } >> } >> >> *--------------------- end example -------------------------- >> >> >> Hope this help, >> >> Robert >> >> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<n.j.cox@durham.ac.uk> wrote: >>> >>> I don't think any trick is possible unless you know in advance the >>> precise distribution, e.g. that it is Gaussian, or exponential, or >>> whatever, which here is not the case. >>> >>> So, you need to look at all the possibilities from the interval starting >>> at the minimum to the interval starting at the 5% point of the fire >>> number distribution in each year. >>> >>> However, this may all be achievable using -shorth- (SSC). Look at the >>> -proportion()- option, but you would need to -expand- first to get a >>> separate observation for each fire. If that's not practicable, look >>> inside the code of -shorth- to get ideas on how to proceed. Note that no >>> looping is necessary: the whole problem will reduce to use of -by:- and >>> subscripts. >>> >>> Nick >>> n.j.cox@durham.ac.uk >>> >>> Daniel Mueller >>> >>> I have a strongly unbalanced panel with 100,000 observations (=fire >>> occurrences per day) that contain between none (no fire) and 3,000 fires >>> >>> per day for 8 years. The fire events peak in March and April with about >>> 85-90% of the yearly total. >>> >>> My question is how I can define the shortest possible continuous period >>> of days for each year that contains 95% of all yearly fires. The length >>> and width of the periods may slightly differ across the years due to >>> climate and other parameters. >>> >>> I am sure there is a neat trick in Stata for this, yet I have not >>> spotted it. Any suggestions would be appreciated. >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**st: RE: How to define shortest possible period with 95% of observations***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Robert Picard <picard@netbox.com>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

- Prev by Date:
**st: SV: RE: Splitting a textvariable** - Next by Date:
**st: tsset multiple imputation panel data** - Previous by thread:
**Re: st: RE: How to define shortest possible period with 95% of observations** - Next by thread:
**RE: st: RE: How to define shortest possible period with 95% of observations** - Index(es):