Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: FW: STATA

From	Menno Kiel <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: FW: STATA
Date	Mon, 14 Apr 2014 12:20:42 +0000

L.S.,


I have a question about using the merge command as a means of performing a date lookup to retrieve a value attached to the applicable date range.

The variable that needs to be looked up is called: Date (=the questionnaire date of a patient), and the goal of the procedure is to attach a predefined time interval (TI) number to the specific Date. The difficulty is that multiple patient groups exist: 21 to be exact (see P.S. below). Within these patient groups, the dates do not overlap, but if you don't take the patient groups into account, they do, resulting in an error related to the non-unique character of the merge variables.

The master dataset is named: AIRFORCE_C_E_Base.dta with the variables ID (patient ID), Date (questionnaire date), patient characteristics (most importantly Allergen and randomization year) and a lot of cost and effects variables associated with this specific date.
The lookup dataset is names: TI_dates.dta        (where TI stands for time interval, with a start [TI_startdate ] and an end date[TI_enddate]. ) (TI ranges from 0 through MAXIMALLY 7, so potentially eight).

In 2011, Phil Schumm ([email protected]) responded to a similar case on STATAlist, but my situation is slightly more complicated due to the fact that the start and enddates of the lookup ranges are only unique WITHIN any of the 21 groups. The original post can be found here: http://www.stata.com/statalist/archive/2011-06/msg00627.html

I figured the simple method he described in the email using the cross command is not applicable here, therefore a two-step merge procedure is in order. He suggests:

    use dataset1
    merge 1:1 _n using dataset2, keepusing(start_x end_x) nogen

    gen start = .
    gen end = .
    forv i=1/`c(N)' {
        if mi(start_x[`i']) continue, break

replace start = start_x[`i'] if inrange(x,start_x[`i'],end_x[`i'])

        replace end = end_x[`i'] if inrange(x,start_x[`i'],end_x[`i'])
    }

    drop start_x end_x
    ren start start_x
    ren end end_x
    merge m:1 start_x end_x using dataset2

Is there a way (e.g. using a while-loop) to perform this procedure for any of the 21 different combinations of random_year and Allergen(_codes)? I figured that start_x corresponds to TI_startdate and end_x to TI_enddate. X must be the variable to be lookedup: Date.
However, running an adapted version of the syntax above for 21 times seems a bit unneccesary.

I'd be more than willing to elaborate. Also, I can send an anonymized version of the datasets. Your help is much appreciated.

Best,


Menno Kiel
Rotterdam, The Netherlands


P.S.
(Explanation of the patient groups: The start and enddates are different for any of the 3 randomization years (Random_year: 2009, 2010, 2011) and 7 Allergens (T, G, H, TG, TH, GH or TGH); this variable is stored as a string ('Allergen') or numerical ('Allergen_code') = 3*7= 21 theoretical combinations of Allergen and Random_year, each with different sets of lookup ranges. I have created a concatenated variable of allergen letter and randomization year (e.g. T2009 for a tree pollen patient randomized in 2009) in the master dataset for future use, reducing the number of key variable from 2 to one.)


______________________________________

                Menno A. Kiel, MD MSc
                research physician / health economist

                Institute for Medical Technology Assessment
                Dept. of Health Economics (GE-iMTA)
                Erasmus University Rotterdam
                Woudestein Campus, r. J5-69
                PO Box 1738
                50 Burg. Oudlaan, 3000DR Rotterdam
                The Netherlands

W1:        +31(0)10-4088885 (ma-do-vrij)
W2:        +31(0)10-7945523 (di-woe)
F:            +31(0)10-4089094
M:          +31(0)6-81501516
E:            [email protected]
I:             www.imta.nl


Disclaimer
________________________________
De informatie verzonden in dit e-mail bericht is vertrouwelijk en is uitsluitend bestemd voor de geadresseerde van dit bericht. Lees verder: www.eur.nl/email-disclaimer <http://www.eur.nl/email-disclaimer>
The information in this e-mail message is confidential and may be legally privileged. Read more: www.eur.nl/english/email-disclaimer<http://www.eur.nl/english/email-disclaimer>
________________________________


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Question about tesitng coefficients in the cointegrating equation with the VEC command
Next by Date: Re: st: Repeated names in a string variable, but some have typos. How to correct?
Previous by thread: st: Variance decomposition ANOVA dummy variable
Next by thread: st: mi estimate and adjusted means
Index(es):
- Date
- Thread