Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st:How to do analysis if the same variable exists in one datasetand is missing or contains no observation in another database?


From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st:How to do analysis if the same variable exists in one datasetand is missing or contains no observation in another database?
Date   Fri, 08 Aug 2003 10:15:45 -0400

You seem to have changed the structure. You now have 8 vars: a, b, ..., h.

And now var5 depends on e through h, rather than a through d.

Let me assume that you want to use all the existing vars in varlist1 -- for both collecting into a set of independent vars, and for creating an additional composite variable. (I will call the composite variable compvar rather than var5.)

I see no need to create a set of new variables; that will only waste space, which seems to be scarce in this particular problem. Instead, just form a new varlist that consists of only those that are actual variables (and not completely missing). Thus it is a subset of varlist1.

Here's my suggestion, borrowing some ideas from what Nick wrote (which I would not have thought of myself):

local varlist1 "a b c d e f g h" // or whatever you might have
foreach x of local varlist1 {
capture confirm var `x'
if _rc==0 {
capture assert mi(`x')
if _rc==0 {
drop `x'
}
else {
local varlist2 "`varlist2' `x'"
}
}

if trim("`varlist2'") ~= "" {
egen compvar = eqany(`varlist2'), v(1)
regress depvar `varlist2' compvar
}

----
One little point to remember: this picks up any variable that is not completely missing. Thus, for example, if you have a million observations, and a variable is nonmissing on only one of them, it will be included. But the regression will be limited to only those observations that are nonmissing on all variables.

It's not clear whether you want compvar to be 0 or missing when all of the other vars are not 1. If it is to be made missing, then you want to also do...
replace compvar = . if compvar == 0

And I note that your regression is now limited to cases where at least one of the other independent variables is == 1. But it is puzzling why you want compvar included among the independent vars. Perhaps you meant...
regress depvar `varlist2' if compvar==1

Or perhaps I misinterpreted your structure.
----
Good luck with this.
-- David


At 07:29 PM 8/7/2003 -0400, you wrote:

Thanks Nick and David for the help.

David, I mean the exact correspondence: var1 for a, var2 for b, etc.

If a variable is absent, it would be preferable not to create it, since the
program is huge.
For example, if a does not exist or contains no observation, thus, it would be
preferable not to create var1.
Even though we create new variables with missing values, they would be
irrelevant for my regressions.

For var5, if at least one of the variables exists, then I want to use it to
create var5. If not, it will not be created.

Nick, all the code is there. My intend is simple.
Let me rewrite all my program below: my dependent variable is depvar (which is
common to all files).

local varlist1 "a b c d e f g h"
foreach x of local varlist1 {
capture confirm var `x'
if _rc==0 {
capture assert mi(`x')
if _rc==0 {
drop `x'
}
else {
g var1=a
g var2=b
g var3=c
g var4=d
g var5=.
replace var5=1 if e==1| f==1| g==1| h==1
}
}
regress depvar var1 var2 var3 var4 var5 /*if one of them exists*/

The first suggestion of Nick seems good, but since I have a lot of variables to
create, it will be very difficult to rewrite the code for each of them.
I will try his second suggestion.

Best regards.

Amadou DIALLO,
AFTHD, The World Bank.
David Kantor
Institute for Policy Studies
Johns Hopkins University
[email protected]
410-516-5404

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index