Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Generating blank observations


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Generating blank observations
Date   Wed, 8 Nov 2006 11:45:18 -0000

There are small terminology problems here
we all share, new or not so new. 

In particular, I'd like to urge that we 
refer to observations that should be 
in the dataset, but are not, as _omitted_, 
not _missing_. Values in the dataset can 
naturally be _missing_ with respect to any number of 
variables. In Stata, "missing" has a very specific
meaning. It is not equivalent to, in British 
idiom, "gone missing", meaning "nowhere to be seen". 

That's picky, as the question was very clear. 

I would do it in place, as I am not a -merge- maven
skilled in choreographing a pas de deux between 
two files. 

clear 
set obs 30
gen id = _n + 11 
gen frog = uniform() 
local N = _N 
qui forval i = 1/41 { 
	count if id == `i' 
	if r(N) == 0 { 
		set obs `=_N + 1' 
		replace id = `i' in l 
	}
} 
gen extra = _n > `N' 
l id frog extra 

This hinges on the fact that if the observation
we would like included has indeed been omitted, then 
-count- will return 0. In that case, we bump 
up the number of observations. The extra observation
is always added at the end. 

The -frog- example here underlines that 
extra values are born missing. 

Also, be aware of -fillin- and -tsfill-. 

Nick 
[email protected] 

Maarten Buis

> I would do this as follows: If you know 
> the lowest and highest number your id variable can take than 
> it is pretty simple to create a new file that will contain 
> all integers between these numbers. Than you can merge that 
> file with your dataset, which will create the new cases and 
> the _merge variable that is created by -merge- will tell you 
> which cases are added. See the example below.
> 
> *------------- begin example -----------
> clear
> set obs 30
> gen mpg = _n + 11 /*I want to fill in all missing integers of mpg*/
> list in 1/10
> sort mpg
> tempfile numbers /*this way the file `numbers' will only be 
> available*/
> save `numbers' /*during this do session, see: -help tempfile-*/
> 
> sysuse auto, clear
> sort mpg
> list mpg foreign in 1/10
> merge mpg using `numbers'
> tab _merge /*a case is added if _merge == 2, see: -help merge-*/
> sort mpg
> gen var1skippedvalue = _merge==2 /*this uses a logical expression
> var1skipped value equals 1 if it is added and zero if it is not*/
> list mpg foreign var1skippedvalue  in 1/10
> *----------- end example ---------------
 
Patrick Woodburn
 
> If I have an id variable called "var1" with a selection of 
> unique values
> in a given range of integers (eg the values 1, 3, 5, 6, 7, 
> and 9), and I
> want to create new observations which contain each missing 
> value in that
> range and are blank for all other variables (eg new observations
> containing 2, 4, 8 and 10) and a new variable to flag that they have
> been artificially generated, what do I do?  Currently, all I can think
> of is the rather roundabout way of doing it below, but I 
> can't help but
> think that surely there must be a more efficient method.

> 
> *Code begins (dataset already open)
> 
> preserve
> keep var1
> drop if var1==.
> bysort var1: assert _n==1
> gen flag=0
> gen id=1
> reshape wide flag, i(id) j(var1)
> forvalues i=1/10 {
>     cap gen flag`i'=1
> }
> reshape long flag, i(id) j(var1)
> drop id
> keep if flag==1
> save var1skippedvalues
> restore
> append using var1skippedvalues

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index