Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: RE: Generating blank observations


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: RE: Generating blank observations
Date   Wed, 8 Nov 2006 17:30:44 -0000

I just said be aware of -fillin-. Like you, 
I don't think it helps in this case. But 
if the problem turned out to be a panel
problem, -fillin- would look more like an
answer. 

Nick 
[email protected] 

Le Wang
 
> When I saw the question, my first reaction is to use -fillin-. But I
> couldn't figure it out how. Would you please give an example? Thanks.
 
Nick Cox

> > There are small terminology problems here
> > we all share, new or not so new.
> >
> > In particular, I'd like to urge that we
> > refer to observations that should be
> > in the dataset, but are not, as _omitted_,
> > not _missing_. Values in the dataset can
> > naturally be _missing_ with respect to any number of
> > variables. In Stata, "missing" has a very specific
> > meaning. It is not equivalent to, in British
> > idiom, "gone missing", meaning "nowhere to be seen".
> >
> > That's picky, as the question was very clear.
> >
> > I would do it in place, as I am not a -merge- maven
> > skilled in choreographing a pas de deux between
> > two files.
> >
> > clear
> > set obs 30
> > gen id = _n + 11
> > gen frog = uniform()
> > local N = _N
> > qui forval i = 1/41 {
> >        count if id == `i'
> >        if r(N) == 0 {
> >                set obs `=_N + 1'
> >                replace id = `i' in l
> >        }
> > }
> > gen extra = _n > `N'
> > l id frog extra
> >
> > This hinges on the fact that if the observation
> > we would like included has indeed been omitted, then
> > -count- will return 0. In that case, we bump
> > up the number of observations. The extra observation
> > is always added at the end.
> >
> > The -frog- example here underlines that
> > extra values are born missing.
> >
> > Also, be aware of -fillin- and -tsfill-.

Maarten Buis

> > > I would do this as follows: If you know
> > > the lowest and highest number your id variable can take than
> > > it is pretty simple to create a new file that will contain
> > > all integers between these numbers. Than you can merge that
> > > file with your dataset, which will create the new cases and
> > > the _merge variable that is created by -merge- will tell you
> > > which cases are added. See the example below.
> > >
> > > *------------- begin example -----------
> > > clear
> > > set obs 30
> > > gen mpg = _n + 11 /*I want to fill in all missing 
> integers of mpg*/
> > > list in 1/10
> > > sort mpg
> > > tempfile numbers /*this way the file `numbers' will only be
> > > available*/
> > > save `numbers' /*during this do session, see: -help tempfile-*/
> > >
> > > sysuse auto, clear
> > > sort mpg
> > > list mpg foreign in 1/10
> > > merge mpg using `numbers'
> > > tab _merge /*a case is added if _merge == 2, see: -help merge-*/
> > > sort mpg
> > > gen var1skippedvalue = _merge==2 /*this uses a logical expression
> > > var1skipped value equals 1 if it is added and zero if it is not*/
> > > list mpg foreign var1skippedvalue  in 1/10
> > > *----------- end example ---------------
> >
> > Patrick Woodburn
> >
> > > If I have an id variable called "var1" with a selection of
> > > unique values
> > > in a given range of integers (eg the values 1, 3, 5, 6, 7,
> > > and 9), and I
> > > want to create new observations which contain each missing
> > > value in that
> > > range and are blank for all other variables (eg new observations
> > > containing 2, 4, 8 and 10) and a new variable to flag 
> that they have
> > > been artificially generated, what do I do?  Currently, 
> all I can think
> > > of is the rather roundabout way of doing it below, but I
> > > can't help but
> > > think that surely there must be a more efficient method.
> >
> > >
> > > *Code begins (dataset already open)
> > >
> > > preserve
> > > keep var1
> > > drop if var1==.
> > > bysort var1: assert _n==1
> > > gen flag=0
> > > gen id=1
> > > reshape wide flag, i(id) j(var1)
> > > forvalues i=1/10 {
> > >     cap gen flag`i'=1
> > > }
> > > reshape long flag, i(id) j(var1)
> > > drop id
> > > keep if flag==1
> > > save var1skippedvalues
> > > restore
> > > append using var1skippedvalues
> >

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index