Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: How to define shortest possible period with 95% of observations |

Date |
Wed, 12 May 2010 15:33:41 -0400 |

What you are asking is not only contrary to Nick's recommendation (and, now, mine), it is a mistake. You are assuming that the peak day for fires occurs exactly in the middle of a "fire year". Of course the peak days will differ from year to year, and you will wind up with overlapping periods. Steve On Wed, May 12, 2010 at 3:13 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > What you outline is not what I was recommending. It's an awkward > half-way house. > > But in terms of your new question, see > > SJ-6-4 dm0025 . . . . . . . . . . Stata tip 36: Which observations? > Erratum > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. > J. Cox > Q4/06 SJ 6(4):596 (no commands) > correction of example code for Stata tip 36 > > SJ-6-3 dm0025 . . . . . . . . . . . . . . Stata tip 36: Which > observations? > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. > J. Cox > Q3/06 SJ 6(3):430--432 (no > commands) > tip for identifying which observations satisfy some > specified condition > > Nick > n.j.cox@durham.ac.uk > > Daniel Mueller > > Thanks. > > In the code below I did my best to think in fire years by defining the > peak fire day (these are always unique) as the middle of the year. Then > I'd like to place the maximum into a local macro (where I fail miserably > > in line 2 as Steve rightly pointed out): > > qui su no_fire_day > loc yearstart = Day[r(max)] - 183 > loc yearend = `yearstart' + 365 > > My simple question is how I can place the Day where no_fires_day = > r(max) into a local macro? > > (I ignore leap years for the sake of simplicity..) > > > Nick Cox wrote on 5/12/2010 10:44 PM: >> Without looking at this in detail, it seems to me that you might > benefit >> from thinking in terms of fire years, rather than calendar years, >> starting on some day other than January 1. After all, all sorts of >> different sciences, not to mention religions, have years that don't >> coincide with conventional Western calendar years: fiscal years, water >> years, academic years, etc., etc. >> >> Several pertinent -egen- functions are included in -egenmore- on SSC. >> >> In other words, define the time scale in terms of those fire years; > then >> Robert's code will probably not need any complicated adjustments. >> >> Nick >> n.j.cox@durham.ac.uk >> >> Daniel Mueller >> >> Robert, this works like charm!!! Thanks a bunch for this neat code. > Also >> >> thanks to Nick for pointing me to -shorth- which I will certainly >> explore in more detail after having sipped through the extensive >> reference list. >> >> Using Roberts code I can seamlessly loop over the nine years of data > and >> >> generate the shortest fire season per year with 95% of obs. The > results >> suggested an additional complication.. For some subsets the shortest >> possible period likely starts a couple of days before Jan 1st, at the >> end of the preceding year. >> >> I tweaked Roberts code a little to loop over years and defined the >> middle of a year as the peak fire day. The code runs through, yet sets >> the start of the fire season for some subsets to Jan 1st, while my >> educated guess is that it should be somewhere around mid to end of >> December. Something went wrong, but I can't spot the glitch in the > code >> below. Can someone please help? >> >> Thanks a lot in advance and best regards, >> Daniel >> >> >> *** start >> forv y = `yearfirst'/`yearlast' { >> >> * keep previous year >> if `y' != `yearfirst' { >> keep if Year == `y' | Year == (`y'-1) >> } >> bys Day: g no_fire_day = _N >> qui su no_fire_day >> >> * define year to start 183 days before peak fire day >> loc yearstart = Day[r(max)] - 183 >> loc yearend = `yearstart' + 365 >> keep if Day> `yearstart'& Day< `yearend' // or with > egen->rotate? >> bys Day: keep if _n == _N >> g nobs = _n >> >> * the target is a continuous run that includes 95% of all fires >> sum no_fire_day, meanonly >> scalar target = .95 * r(sum) >> >> scalar shortlen = . >> gen arun = . >> gen bestrun = . >> >> * at each pass, create a run that starts at nobs == `i' >> * and identify the nobs where the number of fires>= 95% >> local more 1 >> local i = 0 >> while `more' { >> local i = `i' + 1 >> qui replace arun = sum(no_fire_day * (nobs>= `i')) >> sum nobs if arun>= target, meanonly >> if r(N) == 0 local more 0 >> else if (Day[r(min)] - Day[`i'])< shortlen { >> scalar shortlen = Day[r(min)] - Day[`i'] >> qui replace bestrun = arun >> qui replace bestrun = . if nobs> r(min) | nobs< `i' >> } >> } >> qui drop if bestrun == . >> drop bestrun arun >> save fires_`y', replace >> } >> *** end >> >> >> >> >> >> Robert Picard wrote on 5/11/2010 3:28 AM: >>> Here is how I would approach this problem. I would do each year >>> separately; it could be done all at once but it would complicate the >>> code unnecessarily. If the fire data is one observation per fire, I >>> would -collapse- it to one observation per day. Each observation > would >>> contain the number of fires that day. The following code will > identify >>> the first instance of the shortest run of days that includes 95% of >>> fires for the year. >>> >>> Note that the following code will work, even if there are days > without >>> fires (and thus no observation for that day). >>> >>> *--------------------------- begin example ----------------------- >>> version 11 >>> >>> * daily fire counts; with some days without fires >>> clear all >>> set seed 123 >>> set obs 365 >>> gen day = _n >>> drop if uniform()< .1 >>> gen nobs = _n >>> gen nfires = round(uniform() * 10) >>> >>> * the target is a continuous run that includes 95% of all fires >>> sum nfires, meanonly >>> scalar target = .95 * r(sum) >>> dis target >>> >>> scalar shortlen = . >>> gen arun = . >>> gen bestrun = . >>> >>> * at each pass, create a run that starts at nobs == `i' >>> * and identify the nobs where the number of fires>= 95% >>> local more 1 >>> local i 0 >>> while `more' { >>> local i = `i' + 1 >>> qui replace arun = sum(nfires * (nobs>=`i')) >>> sum nobs if arun>= target, meanonly >>> if r(N) == 0 local more 0 >>> else if (day[r(min)] - day[`i'])< shortlen { >>> scalar shortlen = day[r(min)] - day[`i'] >>> qui replace bestrun = arun >>> qui replace bestrun = . if nobs> r(min) | nobs< `i' >>> } >>> } >>> >>> *--------------------- end example -------------------------- >>> >>> >>> Hope this help, >>> >>> Robert >>> >>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<n.j.cox@durham.ac.uk> >> wrote: >>>> I don't think any trick is possible unless you know in advance the >>>> precise distribution, e.g. that it is Gaussian, or exponential, or >>>> whatever, which here is not the case. >>>> >>>> So, you need to look at all the possibilities from the interval >> starting >>>> at the minimum to the interval starting at the 5% point of the fire >>>> number distribution in each year. >>>> >>>> However, this may all be achievable using -shorth- (SSC). Look at > the >>>> -proportion()- option, but you would need to -expand- first to get a >>>> separate observation for each fire. If that's not practicable, look >>>> inside the code of -shorth- to get ideas on how to proceed. Note > that >> no >>>> looping is necessary: the whole problem will reduce to use of -by:- >> and >>>> subscripts. >>>> >>>> Nick >>>> n.j.cox@durham.ac.uk >>>> >>>> Daniel Mueller >>>> >>>> I have a strongly unbalanced panel with 100,000 observations (=fire >>>> occurrences per day) that contain between none (no fire) and 3,000 >> fires >>>> >>>> per day for 8 years. The fire events peak in March and April with >> about >>>> 85-90% of the yearly total. >>>> >>>> My question is how I can define the shortest possible continuous >> period >>>> of days for each year that contains 95% of all yearly fires. The >> length >>>> and width of the periods may slightly differ across the years due to >>>> climate and other parameters. >>>> >>>> I am sure there is a neat trick in Stata for this, yet I have not >>>> spotted it. Any suggestions would be appreciated. >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Austin Nichols <austinnichols@gmail.com>

**References**:**st: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**st: RE: How to define shortest possible period with 95% of observations***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Robert Picard <picard@netbox.com>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**RE: st: RE: How to define shortest possible period with 95% of observations***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: How to define shortest possible period with 95% of observations***From:*Daniel Mueller <mueller@iamo.de>

**RE: st: RE: How to define shortest possible period with 95% of observations***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: question about ds command** - Next by Date:
**Re: st: tsset multiple imputation panel data** - Previous by thread:
**RE: st: RE: How to define shortest possible period with 95% of observations** - Next by thread:
**Re: st: RE: How to define shortest possible period with 95% of observations** - Index(es):