Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Jieyu Wang <joygoforward@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: replacing missing time period data with next closest period |
Date | Fri, 25 Mar 2011 11:48:10 -0500 |
Here is an example you may want to look at: (this example replace current observation with previous period if it is in the middle) ********************** clear set obs 24 generate str10 state="Texas" replace state="New york" in 13/24 bysort state: generate month=_n generate option=. replace option=1 in 1 replace option=0 in 2 replace option=0 in 5 replace option=1 in 10 replace option=1 in 14/15 replace option=0 in 19/20 //prepare the data list sort state month bysort state: generate nomissing=_n if ! missing(option) generate flagbefore=nomissing bysort state: replace flagbefore=flagbefore[_n-1] if missing(flagbefore) bysort state: generate flaginverse=13-month sort state flaginverse generate flagafter=nomissing bysort state: replace flagafter=flagafter[_n-1] if missing(flagafter) drop flaginverse sort state month list bysort state: egen min=min(flagbefore) bysort state: egen max=max(flagafter) replace flagbefore=min if missing(flagbefore) replace flagafter=max if missing(flagafter) //generate flagbefore and flagafter to indicate the nearest nonmissing index sort state month list bysort state: replace option=cond( _n-flagbefore[_n]>flagafter-_n , option[flagafter[_n]], option[flagbefore[_n]] ) list //replace within closest period *********************** 2011/3/16 Doug Hess <douglasrhess@gmail.com>: > Hi. > > I'm hoping there are a set of commands that will help me edit cells > for this missing data problem. > > Each row in my dataset is a month for each state (and DC) over five > years (i.e., 51x5x12= 3,060 rows). The columns are binary values for > the reported presence of a policy that some states in some months have > implemented and others have not. Unfortunately, the months in which > states report the existence (or not) of the policy option varies by > year, and only the months when the option is reported have been coded. > > Thus, a state that reports not having selected a policy option in > April 2001 and then reports adopting the policy in August 2001, with > no reports in between, would have the value of 0 for April and 1 for > August, but missing "." for the other 10 months of the year (i.e., > missing until the next report). > > I would like to fill in the months in between reports with the value > of the next closest month that has a value, and default to zero if the > missing month is equidistant to a 0 or 1. I.e., in the example above, > assuming the policy had not changed before or after these reported > months, the cells from January thru June = 0, July thru December= 1. > > I'm starting to think I'll just do this by hand, but I wonder if > there's some nifty "if then...do" routine that can be created as I > would like to add more years and do this for a large number of policy > variables. > > If it matters: I'm using Stata 11 and don't have Stata programming > language experience (as opposed to issuing commands). My computer > programming skills stopped circa 1982 when I learned Basic on an Apple > IIe. So, if there's a choice between clunky commands and elegant > programming with the same results, I might go for the clunky commands. > But I am happy to learn if there's a resource I need to consult to > understand any advice or tips that are provided. > > Thank you. > > Doug > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/