[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
adiallo5@worldbank.org |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st:How to do analysis if the same variable exists in one dataset and is missing or contains no observation in another database? |

Date |
Fri, 8 Aug 2003 13:40:09 -0400 |

Many thanks David and Nick. I think that I have sufficient elements to write my program. David, I can see that I confuse you with my varlist. More explicitely, I give you this example. Suppose I want to regress children's height. The variables -a through d- refer for example to community and household characteristics; I thus want to add to this set of independent variables, some composites variables (vaccination for example,), assumed independent from depvar. So -e through h- are those variables that give me information on the vaccination status. But not all of them (as well as a-d) are collected in all datasets. So I use all available variables (e-h) to create my composite variable that I add to a-d. Of course, I am only interesed in the case (e-h) are ==1 (child is vaccinated againt DTP, measles, etc), so that all the remaining of compvar will be ==0 (non vaccinated, don't know, missing, etc...). (May be I can force my compvar to have the categories of its components). This reasoning apply only to my compvar, not to the other independent variables witch are equal to a-d, and thus conserve all their values. I am not sure on my idea to creat composite variables. I first think that it allow me to gather all the information. If my reasoning is wrong, I will thus limit myself to take all the (existing) variables and neglect to creat the composite ones. I just take this example, but many others could be given. Thank you very much for your help and have a good week-end. Amadou DIALLO. AFTHD, The World Bank. David Kantor <dkantor@jhu.edu> To: Statalist@Hsphsun2.Harvard.Edu Sent by: cc: owner-statalist@hsphsun2. Subject: Re: st:How to do analysis if the same variable exists in one dataset and harvard.edu is missing or contains no observation in another database? 08/08/2003 10:15 AM Please respond to statalist You seem to have changed the structure. You now have 8 vars: a, b, ..., h. And now var5 depends on e through h, rather than a through d. Let me assume that you want to use all the existing vars in varlist1 -- for both collecting into a set of independent vars, and for creating an additional composite variable. (I will call the composite variable compvar rather than var5.) I see no need to create a set of new variables; that will only waste space, which seems to be scarce in this particular problem. Instead, just form a new varlist that consists of only those that are actual variables (and not completely missing). Thus it is a subset of varlist1. Here's my suggestion, borrowing some ideas from what Nick wrote (which I would not have thought of myself): local varlist1 "a b c d e f g h" // or whatever you might have foreach x of local varlist1 { capture confirm var `x' if _rc==0 { capture assert mi(`x') if _rc==0 { drop `x' } else { local varlist2 "`varlist2' `x'" } } if trim("`varlist2'") ~= "" { egen compvar = eqany(`varlist2'), v(1) regress depvar `varlist2' compvar } ---- One little point to remember: this picks up any variable that is not completely missing. Thus, for example, if you have a million observations, and a variable is nonmissing on only one of them, it will be included. But the regression will be limited to only those observations that are nonmissing on all variables. It's not clear whether you want compvar to be 0 or missing when all of the other vars are not 1. If it is to be made missing, then you want to also do... replace compvar = . if compvar == 0 And I note that your regression is now limited to cases where at least one of the other independent variables is == 1. But it is puzzling why you want compvar included among the independent vars. Perhaps you meant... regress depvar `varlist2' if compvar==1 Or perhaps I misinterpreted your structure. ---- Good luck with this. -- David At 07:29 PM 8/7/2003 -0400, you wrote: >Thanks Nick and David for the help. > >David, I mean the exact correspondence: var1 for a, var2 for b, etc. > >If a variable is absent, it would be preferable not to create it, since the >program is huge. >For example, if a does not exist or contains no observation, thus, it would be >preferable not to create var1. >Even though we create new variables with missing values, they would be >irrelevant for my regressions. > >For var5, if at least one of the variables exists, then I want to use it to >create var5. If not, it will not be created. > >Nick, all the code is there. My intend is simple. >Let me rewrite all my program below: my dependent variable is depvar (which is >common to all files). > >local varlist1 "a b c d e f g h" >foreach x of local varlist1 { >capture confirm var `x' >if _rc==0 { > capture assert mi(`x') > if _rc==0 { > drop `x' > } >else { > g var1=a > g var2=b > g var3=c > g var4=d > g var5=. > replace var5=1 if e==1| f==1| g==1| h==1 > } > } >regress depvar var1 var2 var3 var4 var5 /*if one of them exists*/ > >The first suggestion of Nick seems good, but since I have a lot of >variables to >create, it will be very difficult to rewrite the code for each of them. >I will try his second suggestion. > >Best regards. > >Amadou DIALLO, >AFTHD, The World Bank. David Kantor Institute for Policy Studies Johns Hopkins University dkantor@jhu.edu 410-516-5404 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st:How to do analysis if the same variable exists in one datasetand is missing or contains no observation in another database?***From:*David Kantor <dkantor@jhu.edu>

- Prev by Date:
**st: probit/linear vs. treatreg** - Next by Date:
**Re: st:How to do analysis if the same variable exists in one datasetand is missing or contains no observation in another database?** - Previous by thread:
**Re: st:How to do analysis if the same variable exists in one datasetand is missing or contains no observation in another database?** - Next by thread:
- Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |