Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Generating blank observations

From   "Maarten Buis" <[email protected]>
To   <[email protected]>
Subject   st: RE: Generating blank observations
Date   Wed, 8 Nov 2006 11:38:28 +0100

Welcome to the list. I would do this as follows: If you know 
the lowest and highest number your id variable can take than 
it is pretty simple to create a new file that will contain 
all integers between these numbers. Than you can merge that 
file with your dataset, which will create the new cases and 
the _merge variable that is created by -merge- will tell you 
which cases are added. See the example below.

*------------- begin example -----------
set obs 30
gen mpg = _n + 11 /*I want to fill in all missing integers of mpg*/
list in 1/10
sort mpg
tempfile numbers /*this way the file `numbers' will only be available*/
save `numbers' /*during this do session, see: -help tempfile-*/

sysuse auto, clear
sort mpg
list mpg foreign in 1/10
merge mpg using `numbers'
tab _merge /*a case is added if _merge == 2, see: -help merge-*/
sort mpg
gen var1skippedvalue = _merge==2 /*this uses a logical expression
var1skipped value equals 1 if it is added and zero if it is not*/
list mpg foreign var1skippedvalue  in 1/10
*----------- end example ---------------

Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z434 

+31 20 5986715

-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of Patrick Woodburn
Sent: woensdag 8 november 2006 10:44
To: [email protected]
Subject: st: Generating blank observations

Dear Statalist,

This is my first post to the list, and I hope it is clear enough.  I
just have one question for now:

If I have an id variable called "var1" with a selection of unique values
in a given range of integers (eg the values 1, 3, 5, 6, 7, and 9), and I
want to create new observations which contain each missing value in that
range and are blank for all other variables (eg new observations
containing 2, 4, 8 and 10) and a new variable to flag that they have
been artificially generated, what do I do?  Currently, all I can think
of is the rather roundabout way of doing it below, but I can't help but
think that surely there must be a more efficient method.

Best regards,


*Code begins (dataset already open)

keep var1
drop if var1==.
bysort var1: assert _n==1
gen flag=0
gen id=1
reshape wide flag, i(id) j(var1)
forvalues i=1/10 {
    cap gen flag`i'=1
reshape long flag, i(id) j(var1)
drop id
keep if flag==1
save var1skippedvalues
append using var1skippedvalues

This message (and any associated files) is intended only for the use of
the individual or entity to which it is addressed and may contain
information that is confidential, proprietary, subject to copyright or
constitutes legally privileged information. If you are not the intended
recipient you are hereby notified that any dissemination, copying, printing
or distribution of this message, or files associated with this message,
is Illegal. If you have received this message in error, please notify
us immediately by replying to the message and deleting it from your
computer. Medical Research Council deserves the right to monitor all
communications through its networks. Any views expressed in this message are
those of the individual sender, except where the message states
otherwise and the sender to state them to be the views of any such entity.
[email protected]

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index