Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st:How to do analysis if the same variable exists in one dataset and is missing or contains no observation in another database?

Subject   Re: st:How to do analysis if the same variable exists in one dataset and is missing or contains no observation in another database?
Date   Fri, 8 Aug 2003 13:40:09 -0400

Many thanks David and Nick.
I think that I have sufficient elements to write my program.

David, I can see that I confuse you with my varlist. More explicitely, I give
you this example.

Suppose I want to regress children's height. The variables -a through d- refer
for example to community and household characteristics; I thus want to add to
this set of independent variables, some composites variables (vaccination for
example,), assumed independent from depvar. So -e through h- are those variables
that give me information on the vaccination status. But not all of them (as well
as a-d) are collected in all datasets. So I use all available variables (e-h) to
create my composite variable that I add to a-d. Of course, I am only interesed
in the case (e-h) are ==1 (child is vaccinated againt DTP, measles, etc), so
that all the remaining of compvar will be ==0 (non vaccinated, don't know,
missing, etc...). (May be I can force my compvar to have the categories of its
This reasoning apply only to my compvar, not to the other independent variables
witch are equal to a-d, and thus conserve all their values.

I am not sure on my idea to creat composite variables. I first think that it
allow me to gather all the information. If my reasoning is wrong, I will thus
limit myself to take all the (existing)  variables and neglect to creat the
composite ones.

I just take this example, but many others could be given.

Thank you very much for your help and have a good week-end.

Amadou DIALLO.
AFTHD, The World Bank.

                      David Kantor                                                                                                         
                      <>                                                  To:  Statalist@Hsphsun2.Harvard.Edu               
                      Sent by:                         cc:                                                                                 
                      owner-statalist@hsphsun2.        Subject:  Re: st:How to do analysis if the same variable exists in one dataset and  
                                   is missing or contains no observation in another database?                         
                      08/08/2003 10:15 AM                                                                                                  
                      Please respond to                                                                                                    

You seem to have changed the structure.  You now have 8 vars: a, b, ..., h.

And now var5 depends on e through h, rather than a through d.

Let me assume that you want to use all the existing vars in varlist1 -- for
both collecting into a set of independent vars, and for creating an
additional composite variable.  (I will call the composite variable compvar
rather than var5.)

I see no need to create a set of new variables; that will only waste space,
which seems to be scarce in this particular problem.  Instead, just form a
new varlist that consists of only those that are actual variables (and not
completely missing).  Thus it is a subset of varlist1.

Here's my suggestion, borrowing some ideas from what Nick wrote (which I
would not have thought of myself):

local varlist1 "a b c d e f g h"  // or whatever you might have
foreach x of local varlist1 {
  capture confirm var `x'
  if _rc==0 {
           capture assert mi(`x')
           if _rc==0 {
                     drop `x'
  else {
         local varlist2 "`varlist2'  `x'"

if trim("`varlist2'") ~= "" {
  egen compvar = eqany(`varlist2'), v(1)
  regress depvar `varlist2' compvar

One little point to remember: this picks up any variable that is not
completely missing.  Thus, for example, if you have a million observations,
and a variable is nonmissing on only one of them, it will be included.  But
the regression will be limited to only those observations that are
nonmissing on all variables.

It's not clear whether you want compvar to be 0 or missing when all of the
other vars are not 1.  If it is to be made missing, then you want to also do...
  replace compvar = . if compvar == 0

And I note that your regression is now limited to cases where at least one
of the other independent variables is == 1.  But it is puzzling why you
want compvar included among the independent vars. Perhaps you meant...
  regress depvar `varlist2' if compvar==1

Or perhaps I misinterpreted your structure.
Good luck with this.
-- David

At 07:29 PM 8/7/2003 -0400, you wrote:
>Thanks Nick and David for the help.
>David, I mean the exact correspondence: var1 for a, var2 for b, etc.
>If a variable is absent, it would be preferable not to create it, since the
>program is huge.
>For example, if a does not exist or contains no observation, thus, it would be
>preferable not to create var1.
>Even though we create new variables with missing values, they would be
>irrelevant for my regressions.
>For var5, if at least one of the variables exists, then I want to use it to
>create var5. If not, it will not be created.
>Nick, all the code is there. My intend is simple.
>Let me rewrite all my program below: my dependent variable is depvar (which is
>common to all files).
>local varlist1 "a b c d e f g h"
>foreach x of local varlist1 {
>capture confirm var `x'
>if _rc==0 {
>           capture assert mi(`x')
>           if _rc==0 {
>                     drop `x'
>          }
>else {
>          g var1=a
>          g var2=b
>          g var3=c
>          g var4=d
>          g var5=.
>          replace var5=1 if e==1| f==1| g==1| h==1
>         }
>    }
>regress depvar var1 var2 var3 var4 var5     /*if one of them exists*/
>The first suggestion of Nick seems good, but since I have a lot of
>variables to
>create, it will be very difficult to rewrite the code for each of them.
>I will try his second suggestion.
>Best regards.
>Amadou DIALLO,
>AFTHD, The World Bank.

David Kantor
Institute for Policy Studies
Johns Hopkins University

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index